You've probably been hiring the wrong kind of data scientist

Companies are having a notoriously hard time hiring data scientists, and it’s partly a self-created problem. Employers badly in need of data scientists don’t always understand exactly what a data scientist is. And in a tight market for tech talent, many candidates are more than willing to exploit that ignorance.

A lot of people like to call themselves data scientists because they’re using point-and-click tools, like Tableau and Excel, to perform data analysis and visualization in order to gain business insights. To be clear, these “clickers” are valuable in their own right; they’re smart, capable people who provide plenty of valuable analytics to their employers. However, most of the advances in data science in the last decade have come from coders, not clickers. Clickers don’t usually possess the engineering skills employees typically need when they’re trying to replicate the most cutting-edge data science and AI technologies that their high-tech competitors are developing. Worse, companies are often paying clickers the high salaries they should reserve for coders in the data science field.

It’s not hard to see how this happens. Companies that aren’t digital-first need data scientists, too, but they’re competing with tech firms for talent. They sometimes dress up the job titles of lower-level, point-and-click analytics professionals, terming them “data science” roles. When enough employers do this, it waters down the overall data-science talent pool, making it harder for everyone to hire. Sometimes it’s the result of constrained budgets and wishful thinking (“My Excel analysts are doing data science already!”); other times it’s an attempt to appease egos and retain talent. Either way, these days clickers are more common than coders, and employers don’t fully understand why they should pay the latter so much more than the former. (Sometimes this is the result of constrained budgets and wishful thinking: “My Excel analysts are doing data science already!”)

The shared buzzwords clickers and coders use to compete with one another doesn’t help the problem. Terms like “deep learning” and “entropy loss” are getting thrown around more and more as point-and-click tools grow more powerful. And as machine intelligence techniques improve, business leaders are picking up new vocabulary, too–from “neural networks” to “recommendation engines.” However, these complex systems and technologies really only come into play in the final 10% of a data scientist’s work. The bulk is about data wrangling, the unglamorous process of cleaning, extracting, transforming, and joining data that precedes any effort to use machine learning to derive insights.

The real challenge comes from handling large datasets, including textual or other unstructured raw data, and doing so in real time–all of which requires programmatic execution. That is, coding. Indeed, many of the gains in AI and data science are thanks to what researchers are calling the “Unreasonable Effectiveness of Data”–being able to learn programmatically from astronomical data sets. This work is also highly nuanced and detailed, and doing the wrangling and cleaning properly is crucial for developing effective machine intelligence later on. Point-and-click software just isn’t sophisticated enough to substitute for good programming skills (after all, you can perform machine learning with Excel).

This goes beyond just the usual mantra of “garbage in, garbage out.” Employers are trying to manage turbocharged public relations on social media while staying in regulators’ good graces despite that enhanced scrutiny. And so far, bad AI developed through bad data science hasn’t quite nailed those twin problems of reputation and regulation; indeed, algorithms have been trained to be homophobic, sexist, and racist, while shoddy data analysis can earn companies regulatory fines and class-action lawsuits. Relying on point-and-click, proprietary software exposes firms to untold liabilities.

It’s time to hire data scientists who can code with more powerful tools, like R, Python, and TensorFlow. “Clicker” skills are quickly becoming commoditized, and it won’t be long before that talent pool loses its inflated value. In the meantime, hiring managers need to look beyond the buzzwords and identify the higher-order skills they really need in data scientists. And companies should consider training their existing workforces in data-science and AI skills in order to teach clickers some of the skills they’ll need to become coders. Our shared technological future depends on it.

Michael Li is the founder and CEO of The Data Incubator. A data scientist, Michael has worked at Google, Foursquare, and Andreessen Horowitz. He is a regular contributor to VentureBeat, The Next Web, and Harvard Business Review.

Recognize your brand’s excellence by applying to this year’s Brands That Matter Awards before the early-rate deadline, May 3.

You’ve probably been hiring the wrong kind of data scientist

Explore Topics