It seems like we bring up personality at work all the time. Bosses use it for hiring, department heads use it for team building, individual employees use it for career development—it truly seems an essential element for how things work at work.
Sadly, what most people don’t realize is that the science of personality is incredibly complex and faces ongoing debate, even in the academic literature. Even worse, the vast majority of the personality tests used in business settings are flawed and lack scientific evidence. As a doctoral student in psychology who studies personality at work, I have three warnings for everyone who wants to use personality tests at work.
First, you should know that most popular personality tests rely on flawed theory. They falsely assume that people can be classified into personality types—a theoretical framework that has been thoroughly discredited. These tests—the Myers-Briggs, the DiSC, the Color Test, and the Enneagram—all attempt to categorize people into contrived types. Asking someone if they’re an introvert or an extrovert isn’t the right way to approach personality. People don’t fit into neat boxes; they can’t be classified into “entirely introverted” or “entirely extraverted.”
Instead, personality is best thought of as a continuum: People can vary in degrees from low to high on a given trait. Currently, the most scientifically supported theory is the Big 5, which identifies the degree to which someone is open to new experiences, conscientious, extraverted, agreeable, and emotionally stable. Extensive research has demonstrated that the Big 5 predicts work-related outcomes such as performance, leadership, and teamwork. But even this theory is hotly debated and far from perfect.
In addition to being based on imperfect or discredited theories, most personality tests rely on flawed measurement. Even most Big 5 tests still use a traditional Likert-type scale, which asks participants to rate themselves “on a scale of 1 to 5.” Scientists have been aware for decades that this measurement method is fraught with biases—after all, how can we really trust that someone’s response of “4—I somewhat agree” to the sentiment of “I like going to parties” is truly reflective of their level of extraversion?
Other methods, like forced-choice and AI tests may be more preferable to subjective rating tests (because what does a response “4 on a scale of 1 to 5” actually translate to)? These arbitrary rating tests can be easily faked, especially if a candidate wants to land a job. Extensive research suggests that forced-choice tests, which ask respondents to choose one specific statement that best describes them, can help reduce the number of faked candidate responses. Artificial intelligence-based tests are another method that could potentially take the place of a subjective rating test. However, these approaches are computationally complex and difficult to carry out without advanced statistical training, making them impractical to implement in office culture.
Finally, most personality tests rely on flawed assumptions about the stability of personality. Scientists have begun to realize and find evidence that “personality states” (or how people express their personality) may change not only throughout one’s lifetime, but even throughout the day. Depending on the situation you’re in at any given moment, your behavior will reflect your personality differently. In other words, even if you used a highly accurate measure of personality and got a score of top 10% in your “agreeableness” trait, that won’t hold in all situations. You may be more agreeable with your boss, but less agreeable with your coworkers, or vice versa.
So should we stop talking about personality at work altogether? Maybe. Even if you get the theory right (and use the Big 5), get the measurement right (and don’t use Likert-type measures), and incorporate the effect of the situation—all of which are still ongoing areas of research in the academic literature—you’ll still encounter a discouraging possibility: Perhaps personality simply doesn’t matter as much as we think it does.
But the usual response I hear when I tell someone not to use personality tests is, “oh, but it sounds so accurate, and it helped me discover who I am!” There’s actually a term for this: the Barnum effect, which is a phenomenon wherein people tend to perceive vague, abstract personality statements to be highly accurate and personally relevant, despite a lack of scientific evidence.
From a personal standpoint—I get it. In college, I took a Big 5 test that told me I scored in the seventieth percentile for introversion. Sure, it was a Likert-type measure and didn’t include any consideration for situational differences. But it helped me realize that it’s okay for me to embrace the fact that I prefer to be by myself with a good book, as opposed to attending a large house party. Later, that realization helped me figure out what I wanted to pursue in a career.
Even though they’re not always scientifically perfect, personality tests can still be useful, especially in terms of generating discussion or self-reflection. We see this kind of pragmatic approach to personality in classrooms as well. The theory of “learning styles” (that students differ in how they absorb information, and that instructors should aim to match the preferred style) has been discredited, and yet most teachers will agree that it’s important to present information in different mediums to help students learn. After all, personality tests may be useful in workplace discussions and for team-building. Still, the concept of personality should be handled very carefully and administers and test-takers should be made aware of the warnings and limitations of this brand of personality science.
Steven Zhou is a PhD student in industrial-organizational psychology at George Mason University, where he researches leadership, personality, and psychometrics. He previously worked in HR data analytics at a large international consumer services startup and in college student affairs.
Editor’s Note: A previous version of this piece misstated the effectiveness of certain types of personality tests in relation to more widely-used tests. The relevant paragraph has been updated to more accurately reflect current research on each test’s merits.