Big Data have changed the face of the world!
With 2,3 trillion gigabytes of data created each day, companies have access to valued information on their users, market and much more.
This Data allows companies to constantly improve their product/service.
Companies have understood the issues of investing in Big Data. The soar of Data Engineer and Data Scientist jobs show it to us.
In 2011, Harvard Business Review has even elected Data Scientist the sexiest job of the 21st century!
However, this job field being not being fully mature yet, Data jobs are still subject to misunderstandings. The Data field could appear for many as a blurry technical ‘thing’ which could potentially implement and have an impact on a business.
This misunderstanding could result to failure in a good use of resources. Let’s get back onto the fundamentals of these professions and decrypt the value of each.
Fig.1 - THE DATA SCIENCE HIERARCHY OF NEEDS
When a company makes a product/service, they need valuable information.
This information was difficult to find two decades before. Today, data is the best way to understand the ecosystem where they evolve.
So here is the thing: Data Engineers and Data Scientists are part of a bigger plan. This pyramid illustrates well the process necessary to use Data in a company.
At the base, Software Developers will be working on the collection of all relevant Data for the Data Engineers.
Then, Data Engineers will move and transform this Data into “pipelines” for the Data Scientists.
Finally, Data Scientist will analyze, aggregate and optimize the data for the company.
Sometimes, Research Scientist, Core Data Scientist and Machine Learning Engineer can optimize the Data even more.
This specific process is illustrated in “the Data Science hierarchy of needs” fig.1
Looking at fig1, it becomes quite understandable that to manipulate Data in a good way, Data tasks have to be divided and given to specific Data specializations.
In a proper manner, Data Engineers should build and design what we call “pipelines”.
They usually use programming languages such as Java, Scala, C++ or Python to do that work.
Next, it permits to Data Scientists to start their work which will be focused on analytics, testing, creating and presenting of the Data.
Data Engineers are specialized in 3 main data actions: to design, build and arrange Data “pipelines”.
They are sort of the Data Architects.
Data Engineers often have a computer engineering or science background and system creation skills.
“Data pipelines are sequences of processing and analysis steps applied to data for a specific purpose. They're useful in production projects, and they can also be useful if one expects to encounter the same type of business question in the future, so as to save on design time and coding. For instance, one could remove outliers, apply dimensionality reduction techniques, and then run the result through a random forest classifier to provide automatic classification on a particular dataset that is pulled every week.”
Colleen Farrelly, Data Scientist/Poet/Social Scientist/Topologist (2009-present)
Fig.2 - Pipeline created from raw data to end results data.
What tasks have a Data Engineer in a company?
What competences wait from a Data Engineer?
Data Scientists have normally 4 main tasks in a company. He analyses, tests, creates and presents them to the team.
Data Scientists have a math and statistical background. They have also to be comfortable with creating machine learning and artificial intelligence models.
What tasks have a Data Scientist in a company?
What competences wait from a Data Engineer?
According to Glassdoor:
Data Engineer : $151 / year on average
According to Glassdoor, the number of job openings for data engineers is almost five times higher than the number of job openings for data scientists. This makes sense as most organizations need more data engineers than data scientists on their team.
Data Scientist is a dream work on the paper.
However, troubles come when data scientist comes to little structures and are willing to do tasks that are not in their specialization.
When Data Scientists have to deal with all the Data Hierarchy, it can be painful for them as they have not the programming background of Data Engineers.
Sometimes, being a Data Scientist in a company could look like that:
As a result, studies show that in 2017, 24.0% of Data Scientists have changed job.
For sure, the Data Science job market is a flourishing environment which permit to change for the project employee like the most.
However, it also shows that a large amount of Data Scientists try to find a better place on the market.
Lastely, Data Scientists must have very good communication and persuasion skills to expose their work to the company. It will be essential to comfort the team on the projects and actions to take.
Data Engineers have became a rare commodity.
Glassdoor make a census of more than 107K Data Engineers job opening.
This has been so demanded that everybody is now touched by this penury:
“Even the hottest Silicon Valley companies are unable to achieve a one-to-two ratio. [...] You don’t have enough engineering talent out there. It’s very expensive.” says Tomer Shiran, the CEO and co-founder of Dremio, a developer of big data middleware.
Why recruiters have difficulties to find data engineers today?
In the Netherlands, recruiters are also facing this issue. Hundreds of them are looking for experimented Data Engineers with specific programming skills.
If some of them find their talents, most of them face long term waitings to find their them.
How to find a data engineer?
With having access to a good candidate list of Data Engineers, it is still possible to find a perfect Data Engineers for your company.
However, the process of building a good list of Data Engineers need time and you will need much more energy to have access to it.
One goal of recruitment agencies is to fulfill this gap between demand and offer.
By searching every day best recruits on specific field, we are able to answer to this major issue.