Building or running data infrastructure is an important part of 55% of 372 data engineers’ jobs, according to the “2020 Kaggle Machine Learning & Data Science Survey.” These data engineers are supporting data science applications as well as other use cases. Data engineers are actually a bit more likely (58%) to be analyzing and understanding data in order to influence decisions as part of their job.
Data scientists focus on analysis, which is not as important for machine learning (ML) engineers. Still, there are many similarities between the 2,421 data scientists and 937 machine learning (ML) engineers in the Kaggle survey, with about the same percentage improving ML models, as well as building/running a ML service to improve a product or service.
At 18%, data engineers are more than twice as likely as data scientists to use cloud-based software and APIs as their primary tool to analyze data. They also exhibited a greater likelihood to analyze data in the cloud. Local development environments like Jupyter Notebooks are most likely to be used by all the job roles we reviewed. Basic statistical software, which is defined as spreadsheets, is very popular among software engineers. This is a reminder that just because they know Python doesn’t mean developers will use data tools for data science.
Feature image via Pixabay.