Good data translates into a good customer experience. The more you know about your customers, the better you are at giving them the experience they expect. It has become an invaluable business asset, guiding human decision-making and powering artificial intelligence. However, data has spiraled in terms of complexity and is no longer confined to simple spreadsheets. Instead, it lives in a variety of different forms. That’s why data annotation—the human-led process of labeling text, images, audio, and video clips so computers understand what they’re seeing—has never been more important.
By 2025, an estimated 463 exabytes of data will be created daily across the globe, according to Visual Capitalist. To put that in perspective, each exabyte translates to 1 billion gigabytes. There’s no question some of that will be utterly unusable. But the most successful brands will be the ones that capture the hidden value in the data and translate it into impactful decision-making and more informed customer experiences.
HOW WE GOT HERE: A BRIEF HISTORY OF DATA
To understand the value of data annotation, we need to look at how data became so valuable to begin with. Data has been collected and analyzed in some form for millennia, but by the 1800s, coded punch cards helped enable data processing by machines. Storage was tackled in the 1900s with magnetic tape (which led to floppy discs and hard drives). The proliferation of the internet in the 1990s paved the way for more accessible data and greater diversity in the type of data collected.
Since then, we’ve seen data move through rapid evolution. We’ve gone from collecting data through simple feedback surveys with yes-or-no responses and sales figures tracking product popularity to advanced web analytics and unstructured user-generated content, such as videos, images, audio, and an abundance of social-media posts. And alongside the diversification of data, AI and machine learning have played a major role in making sense of that data to diagnose diseases and drive vehicles autonomously.
Technology is also enabling a data-informed approach to the customer experience. Starbucks, for example, uses AI to analyze data such as location demographics, population density, income levels, and traffic patterns in order to decide where to establish new stores.
Data science helps games companies augment the player experience and personalize marketing strategies. It helps retailers and telecom companies visualize consumer behavior and understand the customer journey in greater detail. For fintech and financial services, it’s a key defense against fraud.
However, the types of challenges we face related to data have also evolved from how to collect and analyze data to how to do so transparently—ethically and without bias while also making sure we’re storing that data securely and confidentially? There is also the glaring challenge of making sure the data is useful.
WHAT’S YOUR DATA WORTH?
AI and machine learning quietly underpin a lot of our day-to-day activities. Machine learning helps deliver Google search results and guide iPhone’s facial recognition. AI-driven chatbots quickly answer customer questions and operate smart home systems. But all of this is made possible by human-labeled data sets. Computers only learn from what we expose them to, and bad data can have a trickle-down effect.
Without well-sourced and accurately labeled data, algorithms can underperform—a factor that carries increasing weight in the current hypercompetitive environment. Researchers at MIT recently found “systemic” labeling errors in popular AI benchmark data sets—data sets that are used to train new AI systems and tell them what to look for in future data sets, powering the prediction process.
For example, image-labeling errors include things like one breed of dog being confused for another, or a Roman statue being categorized as nudity. Sentiment annotation for Amazon product reviews found some positive reviews being described as negative. Video annotation for YouTube videos found “an Ariana Grande high note being classified as a whistle.”
Some of the implications of bad data can be immaterial. But other data labels can have far-reaching consequences due to gender or race bias. A recent article in the MIT Technology Review on how our data encodes systematic racism and lacks diversity says, “the CelebA face data set has labels of ‘big nose’ and ‘big lips’ that are disproportionately assigned to darker-skinned female faces” while data sets for detecting skin cancer have been found to be missing samples of darker skin types. As our data-powered world hurtles towards the AI-driven future, proper representation and diversity in data sets will be not just the inclusive thing to do, but vital to performance and reach.
GETTING THE DATA RIGHT
Quality data annotation is foundational if brands are to unlock the full potential of AI and machine learning. Just as data needs to be carefully sourced—it must also undergo a rigorous annotation and labeling process to avoid harmful biases. In a sense, bias can compound and embed itself in machine decision-making, not only disrupting the customer experience but also perpetuating racism.
Take, for example, a study by researchers at George Washington University on Chicago rideshare trips and census data. The researchers discovered “a significant disparate impact in fare pricing of neighborhoods due to AI bias learned from ride-hailing utilization patterns associated with demographic attributes.” Data that promotes certain biases decreases the value of a service or solution.
Reaching absolute zero bias in data may be impossible. However, collecting good data today enables the technology of tomorrow to make more accurate decisions. This is where the value of a diverse team—or in TELUS International’s case, an extensive crowd of data annotators—can help derive the most value for machine learning programs.
Siobhan Hanna is the managing director of TELUS International’s global AI Data Solutions. The AI Data Solutions team leverages a global crowd of over 1 million community members to help organizations train and test machine-learning models. To learn more about TELUS International’s AI Data Solutions offerings, please click here.
To see the rest of our Data & AI series, please click here.