Data BeastsWhat are Data Beasts? Well, a topical idea, I was delighted to have, as another contribution from Francesco Corea.

Francesco is currently a leading AI blogger, after many years as a consultant & advisor. We last heard from him with his review, of the InsureTech sector.

In this post, Francesco shares, his perspective, on what it takes to be a good Data Scientist & how to become one. It’s interesting to complement the established Analytics perspective of Martin & myself, with Francesco’s emphasis.

As I’m finding at many events, it’s helpful to hear both from experienced leaders & innovators in new tech startups. So, over to Francesco, to benefit from his perspective…

Data Beasts: A Philosophical Introduction

There is a great deal of  confusion & vagueness around what big data and AI really are. The technicalities, of the data black box, have turned the people, who analyze huge datasets, into some kind of mythological figures. These people, who possess all the skills and the willingness to crunch numbers & provide insights, are usually called data scientists.

They have inherited their faith in numbers from the Pythagoreans before them, so it may be appropriate to fancily name them Datagoreans. Their school of thinking, the Datagoreanism, encourages them to pursue the truth through data. To exploit blending, and fruitful interactions with different fields. All to approach postulating new theories and identifying hidden connections.

However, the general consensus about who they are, and what they are supposed to do (and internally deliver), is quite loose. By simply browsing job offers for data scientists, one understands that employers often don’t really know what they are looking for. This is probably one of the reasons, for the apparent shortage of data scientists in the job market.

Data Beasts: Toolbox and Skill Set

In reality, data scientists, as imagined by most, do not exist. Rather, it is a completely new figure, especially at more junior levels. However, the proliferation of boot camps and structured university programs on one hand, and companies’ increased awareness about this field on the other hand, will drive the job market towards equilibrium. A demand-supply equilibrium: where firms will understand what they actually need, in term of skills, and talents. Markets/universities will, eventually, be able to provide those (verified) required abilities.

It is necessary, at the moment, to outline this new role. It’s still half scientist, half designer. It includes a series of different skills, and capabilities, akin to the mythological chimera. An ideal profiling is provided in the following table. It merges five different job roles into one. The computer scientist, the businessman, the statistician, the communicator, and the domain expert.

Data Beasts image 1Clearly, it is very cumbersome, if not impossible, to substitute five different people with a single one. This consideration, allows us to draw several conclusions. First, collapsing five job functions has a controversial effect on productivity, because it might be:

  1. efficient, because the entire value and product chain is concentrated, and not dispersed;
  2. risky, because a single individual can sometimes be less productive (than 5 different people working on the same problem at the same time).

Second, hiring one specialist should cost less than hiring five semi-specialists. But much more, than any one of them alone (because of his specialization, high-level knowledge & flexibility). Looking at some salary numbers, though, this does not seem to be reflected in the job market.

Data Beasts: A Toy-Model for Data Jobs

Using, you can see that (on average) in 2015 in the United States:

  1. a computer scientist earns around $110,000 p.a.;
  2. a statistician around $75,000 p.a.;
  3. a business analyst around $65,000 p.a.;
  4. a communication manager around $80,000 p.a.;
  5. a domain expert around $57,000 p.a.

On the other hand, a data scientist’s salary median, is around $100,000 p.a. That’s according to the survey run by O’Reilly, in the same year.

From the same survey, you’ll notice that an average working week usually lasts 40 hours. During this time, Data Scientists spend twice the time on ETL & data cleaning, compared to running analysis or creating models.

According to these statistics, assuming the rest of their time is equally divided into the other 3 activities, a data scientist should earn around $92,000. This is, of course, a very approximate estimate. It does not take into account any seniority, differences across industries, etc. Domain expertise also varies in value. Even the averages for marketing ($55,000), database ($57,000), network ($64,000) & social media ($41,000) expertise vary, as shown.

But that survey does convey a broad concept. Data scientists seem to be (almost) fairly compensated in absolute terms. But, their remuneration is definitely lower, if compared to the cost structure they face; to become such specialized ‘beast’.

It is really expensive, in terms of education, effort & opportunity costs, to become a data scientist. The average job market, does not sufficiently compensate a candidate for that investment.

Well, truth be told, the market is quickly becoming polarized. Either, you are a top scientist, employed by a huge business, and so you get paid a ton of money. Alternatively, in a smaller firm, you don’t get fairly compensated, for the incredible work it took you to enter this data world.

Data Beasts – Final Considerations

All the considerations, drawn so far, point to a few suggestions for hiring data scientists.

First of all, data science is a team effort, not a solo sport. It is important to hire different figures, as part of a bigger team, rather than hiring exclusively for individual abilities.

Moreover, if a data science team is a company priority, the data scientists have to be hired to stay. Don’t simply hire on a project-basis, because managing big data is a marathon, not a 100 metres.

Second, data scientists come with two different DNAs: one scientific & one creative. For this reason, they should be free to learn and continuously study on one hand (the science one). Plus, they should have time to create, experiment, and fail, on the other hand (the creative one). They will never grow systematically, or at a fixed pace, but they will grow organically. Developing based on their inclinations and multi-faceted nature. It is recommended to leave Data Scientists with some spare time, to follow their ‘scientific inspiration’.

Finally, they need to be incentivized with something more than simply big money. The retention power of a good salary is quite low, compared to interesting daily challenges. Relevant & impactful problems to be solved, are important to motivation. Being part of a  bigger scientific community (i.e., being able to work with peers and publish their research), can also matter.

Data Beasts: are they really that different?

Thanks to Francesco for that perspective. I’m surprised by that view on Data Scientist remuneration (especially compared to what analysts are paid), but I can see his case.

What about you? How do you see these ‘data beasts‘ and what needs to be done to develop & keep them? Feel free to comment in boxes below or on social media.

If you’d like to read more from Francesco, this post is an adapted excerpt from his book. Here’s a link to “Big Data Analytics: A Management Perspective” (Springer, 2016).