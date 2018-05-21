Companies are having a notoriously hard time hiring data scientists, and it’s partly a self-created problem. Employers badly in need of data scientists don’t always understand exactly what a data scientist is. And in a tight market for tech talent, many candidates are more than willing to exploit that ignorance.

A lot of people like to call themselves data scientists because they’re using point-and-click tools, like Tableau and Excel, to perform data analysis and visualization in order to gain business insights. To be clear, these “clickers” are valuable in their own right; they’re smart, capable people who provide plenty of valuable analytics to their employers. However, most of the advances in data science in the last decade have come from coders, not clickers. Clickers don’t usually posses the engineering skills employees typically need when they’re trying to replicate the most cutting-edge data science and AI technologies that their high-tech competitors are developing. Worse, companies are often paying clickers the high salaries they should reserve for coders in the data science field.

It’s not hard to see how this happens. Companies that aren’t digital-first need data scientists, too, but they’re competing with tech firms for talent. They sometimes dress up the job titles of lower-level, point-and-click analytics professionals, terming them “data science” roles. When enough employers do this, it waters down the overall data-science talent pool, making it harder for everyone to hire. Sometimes it’s the result of constrained budgets and wishful thinking (“My Excel analysts are doing data science already!”); other times it’s an attempt to appease egos and retain talent. Either way, these days clickers are more common than coders, and employers don’t fully understand why they should pay the latter so much more than the former. (Sometimes this is the result of constrained budgets and wishful thinking: “My Excel analysts are doing data science already!”)

The shared buzzwords clickers and coders use to compete with one another doesn’t help the problem. Terms like “deep learning” and “entropy loss” are getting thrown around more and more as point-and-click tools grow more powerful. And as machine intelligence techniques improve, business leaders are picking up new vocabulary, too–from “neural networks” to “recommendation engines.” However, these complex systems and technologies really only come into play in the final 10% of a data scientist’s work. The bulk is about data wrangling, the unglamorous process of cleaning, extracting, transforming, and joining data that precedes any effort to use machine learning to derive insights.

The real challenge comes from handling large datasets, including textual or other unstructured raw data, and doing so in real time–all of which requires programmatic execution. That is, coding. Indeed, many of the gains in AI and data science are thanks to what researchers are calling the “Unreasonable Effectiveness of Data”–being able to learn programmatically from astronomical data sets. This work is also highly nuanced and detailed, and doing the wrangling and cleaning properly is crucial for developing effective machine intelligence later on. Point-and-click software just isn’t sophisticated enough to substitute for good programming skills (after all, you can perform machine learning with Excel).