Home
>
Data Science
>
Update Databricks Brings Data Pipeline Service to GA

April 6, 2022 by Phu Nguyen

Update Databricks Brings Data Pipeline Service to GA

Main Contents:

Databricks Brings Data Pipeline Service to GA is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn Databricks Brings Data Pipeline Service to GA in today’s post !

Hot Mess, Cool Cleanup

Let’s start by addressing the problem Delta Live Tables seeks to address. As Ghodsi describes it: “… people are… stitching together so many different things. They have the data, they use these tools to get [it] in, but then they have to use Airflow, or maybe they’re using Oozie, they’re writing a bunch of custom ETL scripts, they’re moving it into data warehouses, they’re moving it into data lakes… they have to do their own monitoring to make sure that this stuff doesn’t break… there’s just behind-the-scenes hell, that everybody has to do.”

Now contrast this with Databricks’ view of how things should be: data engineers should only have to provide a declarative specification of the data transformations they wish to perform in a data pipeline, and do it in a language they already know. Moreover, data engineers shouldn’t have to concern themselves with the logistics behind, or special performance considerations around, executing their pipelines. Instead, they should only have to define a spec; the system should then take over, managing execution on an on-demand, continuous or scheduled basis.

In a nutshell, that’s what Delta Live Tables seek to do.

Sweet Syntactic Sugar

Since Databricks thinks data engineers should be able to do data pipelines by leveraging skills they already have, DLT’s bread and butter are SQL and Python code snippets in a notebook.

On the SQL side, the output of a pipeline is defined by a query whose result set indicates an output table’s schema and content. Extensions to the SQL syntax allow specification of “expectations” — data quality rules and actions to be taken when rows of data don’t comply.

On the Python side, rather than writing imperative code, the developer leverages extensions to the DataFrame API with a declarative syntax for specifying calculations, destination table column names, filter conditions, and support for attributes that specify the same data quality “expectations” supported in SQL.

In Armbrust’s words: “In both cases… you are giving a declarative description of what tables should exist inside of your lakehouse, and then the system is figuring out how to create and keep those tables up-to-date.”

Execution Sans Naivete

Databricks user interface for Delta Live Tables jobs. Note list of status messages from previous run at bottom and execution graph visualization in the center.

Notebooks with DLT code can be scheduled as a special kind of job in Databricks, which triggers analysis of the notebook’s code and generation of an intelligent execution graph. The analysis permits parallel execution of subtasks that are determined not to have mutual dependencies and proper sequencing of subtasks that do. This allows Databricks to go beyond mere agnostic scheduling of the notebook’s code. As Ghodsi explained it, pipelines generated by other platforms whose execution might be orchestrated by Apache Airflow, for example, would not enjoy such boosted execution.

The acceleration this brings is comparable to that of conventional SQL commands executed on a database with a query optimizer. In fact, Spark SQL‘s query optimizer is responsible for generating the execution graph in the first place. This makes sense, because Armbrust also created Spark SQL. In addition, Delta Live Tables works for both streaming data and data-at-rest since Spark Streaming, also created by Armbrust, works with the same data access constructs used by the rest of the Databricks platform.

Think Different?

To date, most ETL implementations have involved completely code-driven efforts, or the use of a standalone ETL platform with a visual design surface. Delta Live Tables finds a middle ground, taking a code-based yet declarative approach. While the dbt platform takes a similar SQL-based declarative approach, it’s a standalone solution, whereas DLT’s engine is deeply integrated into the very same Databricks platform used for data science and analytics.

Check out: Fivetran Transformations for dbt Core Simplifies Data Analytics Pipelines

Meanwhile, there’s no reason that Databricks couldn’t create a visual designer for DLT that would generate the underlying SQL code. In fact, the Databricks workspace user interface generates a visualization of the execution graph when a job is built around a DLT notebook (as seen in the screenshot above). And while the graph visualization is a management/monitoring feature and not an authoring interface, there’s no reason it couldn’t work in both directions, generally speaking. Maybe that’s why I got the distinct feeling when speaking with Armbrust and Ghodsi that a visual designer might be on the horizon.

A Market Execution Engine, too

For now, though, Databricks is focused on making its platform an omni-data workbench and execution environment that spans data ingest, exploration, storage, transformation, analytics, data science, machine learning and MLOps. And as Databricks continues to square off with Snowflake in the battle for independent data cloud provider and ecosystem, its combination of functional breadth and technical depth makes a great deal of sense.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Recommended

Tech News

May 29, 2025 by Anh Hoang

Update Databricks Brings Data Pipeline Service to GA

Read more about Databricks Brings Data Pipeline Service to GA at Wikipedia

Hot Mess, Cool Cleanup

Sweet Syntactic Sugar

Execution Sans Naivete

Think Different?

A Market Execution Engine, too

AI Automation for Business in 2025: A Step-by-Step Guide

FITNESS APP DEVELOPMENT

ONLINE COURSE APP

EVE HR – WEB DESIGN

AIRGOGO WEBSITE

WALLET APP DEVELOPMENT

Ho Chi Minh City Launches Digital Traffic App 2017

Why Your Business Needs a Mobile App Rather Than a Website

7 Questions To Ask Yourself Before You ‘App’ | Entrepreneur

Homestays Marketplace Application Development

Blog post

9 Practical Tips to Choose a Mobile App Development Company for 2023

AI Automation for Business in 2025: A Step-by-Step Guide

Top 10 Offshore Development Companies (ODCs) in 2025

How can businesses effectively integrate AI into their operations?

Locations

Read more about Databricks Brings Data Pipeline Service to GA at Wikipedia

Hot Mess, Cool Cleanup

Sweet Syntactic Sugar

Execution Sans Naivete

Think Different?

A Market Execution Engine, too

Get a custom Proposal

You need to enter your email to download

Blog post

Locations