Data science interviews are daunting, complicated gauntlets for many. But despite the ways they’re evolving, the technical portion of the typical data science interview tends to be pretty predictable. The questions most candidates face usually cover behavior, mathematics, statistics, coding, and scenarios. However they differ in their particulars, those questions may be easier to answer if you can identify which bucket each one falls into. Here’s a breakdown, and what you can do to prepare.
Similar to any other interview, these questions are meant to test for your soft skills and see if you fit in culturally with the company.
Example: What have you liked and disliked about your previous position?
The intent here is to identify whether the role you’re interviewing for suits your personality and temperament, and to identify why you’re moving on from a previous position.
Don’t overthink it or imagine that the key here is really any different from any other type of interview: Just understand the role well, avoid talking about issues you’ve had in the past with specific people, and be professional when describing what you disliked and why. A data science role may call for an analytical mind, but hiring managers still want to hear what makes you passionate.
Data scientist roles where you’re expected not only to implement algorithms but also tweak them for specific purposes will usually come with mathematical questions.
Example: How does the linear regression algorithm determine what the best coefficient values are?
The point is to see how deeply you understand linear regression, which is critical because in many data science roles you won’t just work with algorithms in a black box; you’ll actually put them into action. This category of question tests how much you know about what’s actually happening beneath the surface.
So this is one of those “show your work” moments. Trace out every step of your thinking and write down the equations. As you’re writing out the solution, describe your thought process so the interviewer can see your mathematical logic at work.
It goes without saying that a strong grasp of statistics is important for solving different data science problems. Chances are you’ll be tested on your ability to reason statistically and your knowledge of statistical theory.
Example: What is the difference between Type I error and Type II error?
Proving your mettle requires showing you understand the fundamentals of statistics. But more than that, interviewers also want to see whether you’re capable of using the technical language and logic of statistics to grapple with ideas you may not often approach that way–and still communicate them clearly. So be no-nonsense in your response. Use the relevant statistical knowledge to arrive at your answer, but be as direct as possible about whatever you’re asked to define.
A big part of most data sciences roles is programming to implement algorithms at scale. These questions are similar to the ones candidates face in software engineering interviews; they’re meant to test your experience with the technical tools a company uses and your overall knowledge of programming theory.
Example: Develop a K Nearest Neighbors algorithm from scratch.
Showing you can write out the thinking behind an algorithm and deploy it efficiently under time constraints is a great way to demonstrate your engineering skills. This kind is usually posed to data scientists who have a knowledge both of algorithms and their technical implementation, or data engineers who are given some context on what the algorithm is.
In any event, this type of question tests your understanding of matrix computation and how to deal with vectors and matrices. So start by going through a sample set of inputs and outputs, and manually work out the answer. As you do, keep an eye on time/space complexity.
Last but not least, scenario questions are designed to test your experience and knowledge in different fields of data science, to find out the practical limits of your abilities. Demonstrate your applied knowledge as thoroughly as you can, and you’ll come off well in any case analysis.
Example: If you were a data scientist at a web company that sells shoes, how would you build a system that recommends shoes to visitors?
This question is meant to see how you envision your work delivering products or services from end to end. Scenario questions don’t test for knowledge in every field; they’re meant to explore a product’s life cycle from beginning to delivery and see what limits the candidate might have at each stage of that process. But these questions also evaluate holistic knowledge–for instance, what it takes to manage a team to deliver a final product–to determine how candidates perform in team situations.
Here, too, the usual job-interview advice applies: Be honest about where you can add a lot of value, but don’t be shy about where you expect to get a little bit of help from your teammates. Try to relate how your technical knowledge can help with business outcomes, and always explain the thought process behind your choices and the assumptions that guide them. And don’t hesitate to ask questions that can help you suss out an interviewer’s intentions so you can better tailor your answers.
Data science interviews can be tricky straddling acts–you’re challenged to program and come up with technical algorithms on the spot, but you’re also measured by much the same criteria for nontechnical roles. Your statistical and mathematical knowledge will be tested, as will your ability to lead a team, communicate, persuade, and influence.
So instead of trying to prepare for every imaginable question, prepare for these five types of question. You can’t anticipate every question that’s thrown at you, but you can pretty accurately forecast what a hiring manager’s needs and expectations might be–then set yourself up to meet them.
Roger Huang heads up growth and marketing at Springboard. He broke into a career in data by analyzing $700 million worth of sales for a major pharmaceutical company. Now he writes content that compiles insights from Springboard’s network of data experts to help others do the same.