The headlines about an acute shortage of data scientists have been featuring prominently in the last few years. In a world obsessed with finding the next big innovation with big data, there just don’t seem to be enough data scientists to go around to satisfy an organizational craving for advanced analytics and insights.
Various solutions have been suggested, but many seem to be missing a key part of the problem: the shortage in skilled personnel spans the entire data analytics lifecycle. It’s not just data scientists that are missing in action or exceedingly difficult to hire: a similar problem exists with data engineers, and it’s even more acute, worrisome, and urgent.
Data Scientists and Data Engineers Are Parts of the Same Data Science Value Chain
Let’s first differentiate the two roles. According to an article on data science skills by Elena Grewal, head of data science at Airbnb, data scientists provide expertise in analytics (working on metrics, data storytelling, and tool-building), algorithms (interpreting algorithms that enable data products), and inference (providing causal connections with statistics). In short, the data scientist cleans, kneads and organizes Big Data.
The data engineer’s work can include data governance and quality control, complex distributed architectures implementation (on-premise or in-cloud), data pipeline building and maintenance, resource utilization optimization in storage or compute clusters, and batch processing jobs management to enable access to fresh, accurate data. In other words, she develops, builds, tests, and maintains databases, processing systems, and other architectures.
Right now, data scientists get all the glamour and spotlight, partly, because data science is the final and more visible step in the journey. All the other steps that need to occur before data scientists can even start working (this can be dozens of processes around ingesting, transforming, and structuring data for analysis) belong to the data engineers and are often more labor-intensive than the “gleaning insights” part.
What Does the Data Say?
Since we’re asking a question about data science, it makes sense to answer it with data. If there’s a shortage in a certain field, we would expect to see it manifested in (1) more open positions than available candidates and (2) very high salaries being offered by employers to lure the small number of available candidates.
LinkedIn and Indeed can give us pretty good insights into both of these questions. Here’s what we got looking at data for the United States (stats via LinkedIn):
Some interesting findings here: For every open data engineer job, there are 2.53 suitable candidates. For every open data scientist job, there are 4.76 suitable candidates. And for every open big data engineer job, there are 2.47 suitable candidates. The contrast with other developers is quite dramatic, showing how there is in fact a shortage of data professionals compared an abundance of web developers (10.8 for each position) and marketing managers (53.79 per open position).
Data Scientists Are Charged with a Critical Part of Data Operations, but Not the Entire Process
Organizations who want to work successfully with Big Data have to take a farsighted view of the “data grind” — what it takes to get from the beginning to the end of the journey and what types of talent can deliver the required skills. Before the data scientist can do the magic, the data engineer has to build a whole lot of infrastructure. You need both talents, in equal measure, to optimize the results of your data science value chain.
So, Let’s Get to Work!
Obviously it’s not a “competition” between data scientists and data engineers on which sector actually suffers from talent shortage. For your data endeavor to succeed, you’ll need the expertise of data engineers to build the infrastructure and prepare the data, as well as the skills of data scientists who use this data to develop analyses, algorithms, and research. A shortage of talent in either field can only raise manpower costs and doom the entire data science project.
Since media has focused on the data science talent gap, there is now a plethora of suggestions to address it. For tackling the data engineer skills gap, not too many exist.
Let me put forth a few:
- Look inside your company: You can increase your data engineering expertise without relying on the availability of outside candidates by training current personnel in data engineering frameworks, programming languages, and systems.
- Invest in technology: You’ll need fewer data engineers if you invest in solutions that automate data engineering processes and workflows.
- Imagine the future: Be ready for the surge in R&D, DevOps, and IT budgets as data engineer salaries climb. Adopt a holistic view of your data operations.
Your data endeavors can translate into meaningful business value if you understand how data science really works and what your data scientists and data engineers really do.
You may already have the right people, now you just need to put them in the right place with the right technology — then you don’t have to be in this fight at all.
Feature image via Pixabay.