Test drive data science
Reading Time: 3 minutes

New York is one of the greatest tourist destinations in the world, but navigating your way through this vibrant city can be challenging. If you go there, one thing you cannot miss is the prevalence of Citi Bikes, which offer a quick, easy way to hop across town.

Just as Citi Bikes enable streamlined ways to navigate complex cities like New York, the Denodo Platform is making it easier for stakeholders to navigate and prepare complex troves of data for machine learning (ML) or data science projects. 

To demonstrate this, Denodo created a new Denodo Test Drive focused on data science and machine learning (ML). This Test Drive shows how the Denodo Platform, which leverages a full-featured data catalog to curate and present the data sets to data scientists, accelerates their ability to build ML models and perform predictive analytics.

Our first Denodo Test Drive demonstrated how the data virtualization capabilities of the Denodo Platform enhances BI and analytics reporting, while this one shows how the Denodo Platform enhances insight and agility around data science and ML projects on the AWS Cloud. In keeping with the Citi Bike theme, this Test Drive will employ real data sets from New York Citi Bike trips, together with NOAA weather data, to create inputs for an ML model that predicts bike demand.

Finding useful data can be tricky. Even when the data is available somewhere within the company, navigating bureaucracy and department-owned data is not always easy, and it can often be time-consuming. The data might only be accessible via unfamiliar protocols, and it might be stored in a variety of formats, such as NoSQL data stores, SaaS APIs, and it might use complex relational schemas.

In this Test Drive, you’ll learn how data virtualization, combined with modern architectures like logical data lakes, can help business users gather quick insights from their data sources and smoothly proceed through the data science project life cycle, as shown below.

  • All data is accessible through a single location (exposed via the Denodo Data Catalog).
  • Data can be queried via SQL and REST calls, enabling a unified format for data access.
  • Access is controlled, thanks to rich security features in the virtual layer, including the ability to send pass-through credentials to the underlying data sources.

In summary, this Test Drive provides you with the opportunity to quickly learn how the Denodo Platform can seamlessly combine the data across three disparate sources (Amazon S3, web service, and a data warehouse); massage it, prepare it, and present it via the Denodo Catalog; and then feed that data to the ML model (using Python and random forest algorithm,); to predict future Citi Bike usage based on a variety of conditions such as weather data and holiday schedules.

You can do a lot more with the Denodo Platform; its potential use cases for ML and data science are limited only by your imagination. But the best way to get started is with our Test Drive. Please share your feedback below in the comments, and come back for more exciting Test Drives in the near future.

Mitesh Shah