Data engineering, as a distinct field, whose practitioners have a cohesive group identity as data engineers, is fairly new. So new, in fact, that there are many people who don’t seem to understand exactly what data engineering is and what it is not, and where the border exists between data engineering, data science and software engineering. 

“When you look at some job descriptions, a lot of times you’ll see that they want a data engineer, but when you read the details, the company is actually looking for someone who specializes in machine learning, or someone who has a background in data science or someone who’s an analyst or a visualization engineer,” explained Robbie Smith, senior data engineer at Guild Education. “In a lot of job descriptions, data engineering is conflated with other data-related professions.”

Perhaps ironically, data engineering, as a profession, has more in common with software engineering than with data science. Most data engineers started as software engineers, and there’s a fairly broad overlap in skills sets used for data engineering and software engineering. 

“I see data engineering as a subcategory of software engineering,” explained Luke Feeney, co-founder and chief operating officer at TerminusDB. “In most shops, we see that the data engineers are writing Python scripts to get their data from point A to point B. These are people who are coders.”

This sentiment was echoed by Smith, who described his own career trajectory as starting out as a software engineer before specializing in data engineering when offered the chance to do so at a new job. 

What about Data Science? 

In fact, the largest misconception about data engineering is that it is closely related to data science. The two disciplines are related, but in the same way goats are related to grass, not the way goats are related to sheep. Data engineers build the pipelines that data scientists depend on, but the two professions are very different. Whereas there is a giant overlap in skill sets between software engineering and data engineering, the skillsets and career path of a data engineer and a data scientist are quite different. 

Read More:   Update Mnemonic: Memory Management for Big Data

Data engineers are responsible for building a beautiful data pipeline that works every time, that has revision control and is very structured and orderly. Data scientists are trying to make sense of that data — to understand why anyone should be moving around in the first place and to use the data for business reasons. They generally have Ph.D.s in statistics and approach their work like scientists — they want to run experiments, not write code. 

So Data Engineers are Code Slingers? 

Andrew Stevenson, chief technology officer at lenses.io, thinks organizations should value data engineers who can do more than create sleek pipelines, but admits that what many organizations see them as. “I used to see great data engineers who best understood business requirements being muscled out in an organization because they didn’t adopt the latest open-source, bleeding-edge technologies,” he said. “Many of these big data projects failed.” 

This is something that could be said about software engineers as well: The best software engineers will understand not just the technical requirements for a particular software, but also what business outcome the organization is hoping to achieve. 

In fact, the largest misconception about data engineering is that it is closely related to data science. The two disciplines are related, but in the same way goats are related to grass, not the way goats are related to sheep.

This does not mean that data engineers are low-level grunts. “I think there’s this impression that it’s kind of a crude task,” Feeney said, about building a data pipeline. “There’s nothing further from the truth. If the data pipeline doesn’t work and you’re building a data-intensive application or running a series of experiments, then everything falls apart.” Getting high-quality data out of transactional systems is challenging. “We work with a lot of incredibly talented data engineering teams that are faced with shocking challenging beating databases into submission so that the data comes out in a usable format,” Feeney said. 

Read More:   Update Alibaba Offers an Alternative to Amazon Web Services for U.S.-wary Cloud Users

Data scientists depend on data engineers to get high-quality data so that the experiments aren’t plagued by ‘garbage in, garbage out’ problems. “We’ve seen cases where the data science team thinks they’ve had some amazing breakthrough,” Feeney said. “Then the data engineering team tells them, no, actually somebody changed the way we record this on the 14th of June. There’s nothing there.” So getting the pipeline right is incredibly important. 

The Future of Data Engineering

“The businesses that understand their data and how that data can inform their business are going to be the ones that are successful,” Smith said, about why he things that data engineering as a speciality will only expand. “Companies will need good data management strategies, and they will need more people to specialize in these underlying systems.”

Beyond the fact that companies are likely going to rely even more on data pipelines in the future, there are a couple buzzwords come up when talking about the future of data engineering. First, let’s talk about DataOps: embracing as much automation in the data pipeline as possible and allowing data engineers to focus less on low-level coding and more on creating tooling that will allow data scientists and business experts to self-serve as much as possible. Stevenson sees this as the future of data engineering: Data engineers who are more technology advisors than writing Python scripts. 

There’s also the idea of data mesh — increasingly embedding data engineers into business teams, so that a data engineer isn’t just moving data from point A to point B but rather is part of the conversation about how specific types of data sets can be used, what needs to happen to the data to make it useable and what kinds of business use cases the data can drive. “I’m trying to provide tools for domain-driven decentralization so that you have data owners, data producers and data engineers working within specific domains, then cooperating to make that data available as a product,” Feeney said. 

Read More:   Update Redis: How Probabilistic Data Structures Support State-of-the-Art Apps

Data engineering is also relevant to the ongoing conversation about data privacy. “I’m in the European Union, so GDPR is a big issue,” Feeney said. “And that’s a data engineering challenge.” In many cases, it might involve getting certain data out of a database and to a business owner while stripping all the personally identifiable information out. “There’s a lot of really interesting work going on there around data privacy and how you can take control of your personal data.” 

List of Keywords users find our article on Google:

data science vs software engineering
data engineering jobs
data engineer vs software engineer
data engineering vs data science
data science vs data engineering
prometheus labs wikipedia
confluent control center
facebook data engineering
how much does a data engineer make
how much do data engineers make
data engineer at facebook
kafka lenses
software engineer vs data engineer
confluent platform
helm kafka
azure data pipeline
wawa jobs
offshore structural engineer jobs
working at wawa
software engineering vs data science
iothinks
smith io lenses
data science or data engineering
hire kafka developers
kafka clients
who are engineers prometheus
work for wawa
point b jobs
gooddata jobs
lenses io
prometheus engineer
dataops azure
dataops for azure
kafka client id
data science use cases
engineering data management
terminusdb
data scientist linkedin profile
prometheus real estate group jobs
future of data engineering
the future of data engineering
data science linkedin profile
probability & statistics for engineers & scientists, my lab statistics
update solutions
professionalism wikipedia
data engineering future
linkedin profile data scientist
prometheus group jobs
kafka admin client
data engineer facebook
data engineering facebook
confluent kafka
confluent jobs
distinct neo4j
work at wawa
kafka helm
kafka big data wiki
data science fintech jobs
kafka hybrid cloud
kafka admin jobs
facebook data engineer
structured streaming kafka
confluent use cases
data engineer future
confluent professional services
prometheus who are the engineers
prometheus engineers explained
confluent-kafka-python
dataops engineer jobs
senior database engineer jobs
confluent kafka python
linkedin summary data scientist
confluent kafka start
outsource speciality writing services
python confluent-kafka
hospitality design guild
guild education series e
what is client id in kafka
stevenson field el segundo
kafka on kubernetes issues
raeng logo
working at guild education
azure arc costs
chief technology officer career path
azure dataops
confluent-kafka python
data engineer python jobs
net kafka client
offshore job engineer
confluent consultants
kafka producer group
data engineer vs data scientist
guild education jobs
kafka event streaming platform
“data engineering”