We are at a crucial stage of the data science evolution. Data-driven applications are the new language that unites organizations and enables data scientists and machine learning engineers to directly communicate their findings with their peers.

Adrien Treuille
Adrien is Snowflake’s product manager for visual data products. Prior to Snowflake, Adrien was co-founder and CEO of Streamlit, vice president of Simulation Zoox, led a Google X project, and was a professor of computer science and robotics at Carnegie Mellon University.

With cloud computing adoption, any amount of data can be collected and processed. So, what can we learn from all this data and how do we extract and create meaning?

Until now, the traditional SQL-focused approach has helped to drive decision-making by applying data visualization techniques to structured, tabular information stored in databases. However, predictive machine learning models like neural networks and semi-structured or unstructured data types, such as images, videos and sentiment data, are not a perfect fit within the old SQL paradigm. Data scientists and ML engineers are finding it difficult to share the insights they’re gaining in their work when dealing with these new data types and fresh ways in which to process that information.

Read More:   TOP 5 Observability Trends for 2022

An Infinite Range of Possibilities Opens Up

Data scientists and ML engineers want to create and experiment with arbitrary mashups of different data. For example, tweets with sentiment data attached by artificial intelligence or videos with annotations made by humans that speak to the emotional punch of the story provide an additional way to measure marketing return on investment.

This new data stack is powered by Python along with new Python-native ways to help process, transform and visualize data. The output of this visualization is data-driven applications. These applications empower business users with an endless and dazzling new set of abilities from predicting the vacancy of parking spaces to assessing a retailer’s procurement needs to identifying the optimal layout for solar farms.

Data scientists and ML engineers can take an organization’s call center and create a visual data product by mashing up audio data, machine learning models and sentiment analysis. Different audiences can access versions of this data-driven application tailored for their specific needs. Call center managers and the executive suite can then dig into customer call center experiences across the country, look for differences and make decisions on how to improve customer service based on that data.

Moving Beyond Today’s Communication Bottleneck

Adapting to new data-processing demands and working inside the Python world, data scientists and ML engineers often struggle to share their data products outside of their own teams. They may spend much of their time replying to endless daily email threads about single elements of their model results or become stuck and watch from the sidelines as their organization hires a new team to create a one-off data application.

Read More:   From ‘Fat Sheepdogs’ to ‘Robot Cats’ – InApps 2022

Communication bottlenecks prevent fast and easy data sharing between data scientists and business users. The involvement of other teams sitting between these two groups further complicates the situation. There is an acute need for Python-native data visualization to function as a common language for the new data stack so that data scientists and ML engineers can express their data insights in data-driven applications aimed at business users. At the same time, data scientists and ML engineers need to be able to create and iterate on these data artifacts quickly to keep up with the data itself, which is constantly changing.

The ease of building data apps will revolutionize and define a fundamental change in the roles of data scientists and ML engineers. They will move center stage within their organizations and gain the power to impact and help drive better business decision-making through their art, discoveries and actionable insights.

Next: A Cycle of Collaboration with Data Visualization

Emerging data visualization tools already allow data scientists and ML engineers to directly share visual representations of their work through code with other teams within their organizations. However, we’ve yet to achieve true two-way communication and collaboration through the media of data artifacts.

Next, we will see the introduction of technologies that enable the creation of virtuous tight collaboration loops between the creator of a data product and the consumer of that artifact. The consumer will be able to share their input on the data visualizations provided by the data scientists and ML engineers. Together, the teams will be able to arrive at the optimal representation of data insights, which are easily understood by anyone and can be quickly acted upon.

Photo by Dennis Kummer on Unsplash