Home
>
Data Science
>
Update The Coming Era of Data as Code

March 30, 2022 by Phu Nguyen

Update The Coming Era of Data as Code

Main Contents:

The Coming Era of Data as Code is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn The Coming Era of Data as Code in today’s post !

What Is Data as Code?

Constantinos Venetsanopoulos

Constantinos Venetsanopoulos is co-founder and CEO of Arrikto, a company building a machine learning platform that simplifies, accelerates, and secures model development through production. Originally from Athens, Greece, he studied computer science at NTUA, where he earned his master’s degree. Before Arrikto, he helped design and build a large scale public cloud service, one of the largest in Europe at the time. With Arrikto, he arrived in Silicon Valley, working to redefine what’s possible in AI/ML, by rethinking how applications manage and store data.

Data as Code is an approach that gives teams — from DevOps to DataOps, Data Scientists and beyond — the ability to process, manage, consume, and share data in the same way we do for code during software development. It empowers end users to take control of their data to accelerate iterations and increase collaboration.

The DevOps revolution empowered developers and caused a “shift left” that focused on acceleration and problem prevention while sprouting a new generation of tools like GitHub, Jenkins, CircleCI, Gerrit, and Gradle that allowed end users to ship software. What comparative tooling do we have for data? What enhanced processes do we have?

Think about the end users in each scenario.

When an application needs to be deployed, a DevOps Engineer simply deploys it via automated pipelines. When they need storage provisioned, they programmatically request it from the cloud provider and attach it to their application. When they need to expose application access across the network, they create a service endpoint and call an ingress gateway.

But what happens when a developer or application owner needs data? The developer asks the DataOps team or hosting application owner for the data. What happens when they need to share that data with colleagues or move it between clouds? They wait for DevOps engineers to help them. What happens when they want to synchronize their datasets across lifecycles? They wait for DevOps engineers to help them.

These processes are largely manual, locking entire workflows into an outdated request-and-wait cycle. Much like a manufacturing line at a factory, these manual processes only work if everyone is available. If one link in the chain is missing, requests get stuck in wait.

By taking a Data as Code approach, companies can manage data programmatically, set up automated continuous integration and deployment pipelines for data, add the ability to version, package, clone, branch, diff and merge data, and also make it collaborative across different clouds and workspaces — just as they do with their code and deployments.

Flexible Data Means Empowered End Users

Despite our best efforts, data is still largely kept in silos. Some of those silos are monolithic and some are distributed, but they’re still silos. As happens with siloed data — even in modern cloud environments — different teams manage each repository and require different processes to access the data inside.

While we are getting better at connecting systems through APIs, we have added entire DataOps teams whose job is to manage the data pipeline alongside the data user. As much as we try to “jazz it up,” we are still doing ETL (extract, transform, and load).

The way we approach data management fundamentally opposes the way we need to use it today. We don’t need silos, and data or storage admins. Instead, we need to think of data in terms of end user publishers and subscribers with a third party that could define regulations, access control lists and other admin responsibilities while versioning and differencing the data.

Much like what GitHub does for code in developer workflows, taking this approach to data management would allow us to move ownership of data to the app level and make data inherently more mobile and shareable. Most notably, it would empower the people who work with data every day.

Pipelines Aren’t Just for Code

Perhaps nobody feels the pain of outdated data workflows more acutely than data scientists. No other applications are as reliant on data as machine learning and artificial intelligence, but the people that build those apps are stuck using outdated processes. Today, when data scientists build and train models, they share new data with their machine learning colleagues and begin iterating their model development in tools like Jupyter Notebook, Visual Studio Code, or R Studio. Those models get tweaked and changed, all using copies of the same data. Invariably, the data needs to be modified, or an updated version needs to be requested from the application team.

When that happens, data science teams have to manually keep track of model experimentation against both a model and data version, while also training updated models against the entire data set from scratch. It’s an enormous waste of time and resources.

What if, instead, they were able to build, train, and tune their models and push them toward deployment, completely packaged up so the production DevOps and MLOps engineering teams can simply release via familiar CI/CD pipelines?

We need this shift left in the data equation. Data as Code gives data scientists and machine learning engineers the capability to manage data across any cloud, to collaborate on branches of versioned data sets, and continuously retrain their models by merging differential sets as they gather more inputs, just as DevOps has done for software development.

Democratized Data Means Better Everything

In 2002, Jeff Bezos sent out a company-wide email at Amazon that became known as the “Bezos API Mandate.” It directed that every team in the company interact with one another through interfaces over the network — every piece of data, every function, no matter what. It was a call to organize the company around getting things done, get rid of the stasis of the request-and-wait mentality.

Software development has undergone a similar reckoning over the past decade due to the DevOps revolution. Now, with the start of the Data as Code era, it’s time to do the same for data management. DevOps Engineers and Site Reliability Engineers no longer rely on request-and-wait style ITIL-based workflows for infrastructure administrators, and there’s no reason we can’t do the same for people that work every day with data.

An organization where data access is democratized — where everyone has secure access to shareable data whenever they need — is an organization where important decisions are made faster and more intelligently. It’s an organization where products get shipped more frequently at lower cost and at higher quality, and where everyone working on those products is empowered to be the best at what they do.

This is the promise of the Data as Code era.

Data as Code will require a complete philosophical realignment in our approach to data management. We’ll have to throw away a lot of current processes and practices in order to reorient them around truly flexible data, but we have the infrastructure available to make this happen. Kubernetes in particular has unlocked the pathways that make Data as Code possible. It’s the future of the application control plane and will be the foundation for the technologies that will create the future data control plane.

We’ve already been through a radical shift in how applications are made. It’s time for another radical shift in the way they’re fed.

AWS Cloud and CircleCi are sponsors of InApps.

Feature image via Pixabay.

Source: InApps.net

List of Keywords users find our article on Google:

arrikto

itil problem management template

managed postgres

call center outsourcer athens

outsource customer service athens

moveit cloud

radekal productivity tool reviews

postgres update

rstudio cloud

circleci status

circleci postgres

amazon clone app development

itil master data management

itil wikipedia

contact center outsourcer athens

outsource contact center athens

devops e2ee collaboration and messaging

the reliant wikipedia

nextcloud github

circleci images

rstudio cloud collaboration

nextcloud apps

visual studio data science

gradle training

circleci github

postgres with

vision artificial wikipedia

jeff models wikipedia

r/devops

success factory wikipedia

wawa management jobs

emp trustpilot

ntua jobs

ntua

data management wikipedia

era of data

amazon engineering notebook

how often is wikipedia updated

“hires.shareable.com”

amazon master data management jobs

mntua

amazon itil

send message to teams programmatically

front end engineer amazon

hire circleci developers

rstudio merge

kubernetes ingress icon

nextcloud wikipedia

styler rstudio

nextcloud app store

read whatsapp messages programmatically

rstudio linkedin

amazon front end engineer

rstudio wiki

circleci scheduled pipelines

ntua phone number

play framework gradle

amazon devops engineer

amazon site reliability engineer

circle ci new ui

jazz custom package code

rstudio kubernetes

update postgres

workspaces circleci

“the ai codes”

game based learning wikipedia

visual studio code wiki

update rstudio

master data management cloud native

visual studio code jupyter

update nextcloud manually

you are using an outdated app telegram

jupyter visual studio code

whatsapp web stuck at organizing messages

nextcloud review

amazon saas factory

ingress mission day

master degree in offshore engineering

nextcloud how to update

vscode diff

aws data pipeline

aws itil

devops end-to-end encrypted collaboration and messaging

visual studio code merge branches

endpoint application control

aws saas factory

custom application development

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Recommended

Tech News

May 29, 2025 by Anh Hoang

Update The Coming Era of Data as Code

Read more about The Coming Era of Data as Code at Wikipedia

What Is Data as Code?

Flexible Data Means Empowered End Users

Pipelines Aren’t Just for Code

Democratized Data Means Better Everything

List of Keywords users find our article on Google:

AI Automation for Business in 2025: A Step-by-Step Guide

FITNESS APP DEVELOPMENT

ONLINE COURSE APP

EVE HR – WEB DESIGN

AIRGOGO WEBSITE

WALLET APP DEVELOPMENT

Ho Chi Minh City Launches Digital Traffic App 2017

Why Your Business Needs a Mobile App Rather Than a Website

7 Questions To Ask Yourself Before You ‘App’ | Entrepreneur

Homestays Marketplace Application Development

Blog post

9 Practical Tips to Choose a Mobile App Development Company for 2023

AI Automation for Business in 2025: A Step-by-Step Guide

Top 10 Offshore Development Companies (ODCs) in 2025

How can businesses effectively integrate AI into their operations?

Locations

Read more about The Coming Era of Data as Code at Wikipedia

What Is Data as Code?

Flexible Data Means Empowered End Users

Pipelines Aren’t Just for Code

Democratized Data Means Better Everything

List of Keywords users find our article on Google:

Get a custom Proposal

You need to enter your email to download

Blog post

Locations