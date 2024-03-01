BY Isaiah Steinfeld5 minute read

This is an exception. This, right here: You reading this article, accessing text on the internet. Today, watching a video or scrolling through photos is more likely. By 2025, IDC projects about 80% of data created worldwide will be unstructured data—mostly images, videos, and audio files. Until now, the best way of handling this has been metadata, but artificial intelligence is rewriting the very DNA of media files. If you do a Google image search for “fast cars,” all the results come from pages with words like “fast” or “cars,” or images tagged with metadata—hidden text on the backend of the page. And, if you were a graphic designer for a brand or a magazine, you probably use a digital asset management system built around tags. After a freelance photographer sends you stills from the latest footwear shoot, you look at every image, tag it, and then upload it to your media library. When you need a hero shot of sneakers for the next big drop, you search your library for the “sneakers” tag. If that sounds like it takes forever and is incredibly tedious, it’s because it is. If it sounds like your job is getting harder every day because so much more content is video, it is. You’ll never keep up.

FLIPPING DIGITAL ASSET MANAGEMENT ON ITS HEAD As we all go from reading “How To” articles to watching explainer videos, the sheer size of data is exploding. We’re moving from what Coactive CEO Cody Coleman describes as a world of data lakes to a world of data oceans. Coactive, by the way, is one of the startups leveraging Amazon Web Services’s (AWS) generative AI stack to tackle this issue. Metadata relies on a process Coleman calls Tag, Load, Search (TLS). AI already revolutionized TLS a few years ago when image recognition became reliable enough for algorithms to start adding tags to images, instead of humans, but it was still slow. The metadata needed humans to clean it, and algorithms still went through images one by one. That was a massive step forward in computer vision, but early machine learning didn’t understand context.

So, when an apparel brand did a big ad campaign for “ripped jeans,” it fell down when consumers visited the website, searched for “ripped jeans,” and got zero results. In the media library, all those images had been tagged as “distressed jeans.” Generative AI uses embeddings and active learning to understand the context behind “distressed,” which can also mean “ripped.” FINDING THE RIGHT IMAGE WITH INTELLIGENT SEARCH For applications to learn that context, companies will need to draw on more than one foundational model (FM) for language, image recognition, and more. AI tools like Griptape, one of the startups to join the first cohort of the AWS Generative AI Accelerator, assemble a pipeline of workloads using different FMs.

Now, generative AI is flipping TLS on its head, using dynamic tagging and embeddings for a load-search-tag process. Companies like Wayfair are using AI tools such as Snorkel Flow to improve automated tagging across their product catalog for color, style, pattern, and more. With dynamic tagging, graphic designers simply upload all their images to the cloud. Generative AI models continually learn and suggest a tagging system to help designers always find the right images. Foundational models add vectors to the image data, such as embeddings, that create a sense of context. Instead of searching for images tagged with the word “ripped,” search tools can now search for images that look like ripped jeans. This intelligent search approach is more intuitive and produces way better results. SHORT-CIRCUITING CONSUMER RESEARCH

I wish I had all these tools when I was in product management. There are so many ways we could have used dynamic tagging and intelligent search beyond our media library or product catalog. We used to do endless rounds of focus groups, consumer interviews, and product demos. Afterward, we’d all go back to our offices, type up our notes from those sessions, and try to tease out some trends. Of course, we taped those focus groups, but mostly for archive purposes. Now, imagine having generative AI scouring all those hours of video and audio from consumer interviews for insights. Load, search, tag. Imagine uploading all the unstructured data and using active learning protocols to teach the algorithms the context: what it means when people’s eyes light up, or when they say, “This is nice,” vs. “I can’t believe I ever lived without this.” SHRINKING THE OCEAN OF CONTENT MODERATION

Active learning means instead of dumping a set of unstructured data into the foundational model, your experts give it little nudges—hints of where to find the most essential context. Humans are not replaced by this technology—they just get whole days back. It takes a lot less time to provide active learning hints on the front end than cleaning up every single output on the back end. One company that previously relied on humans on the front end was Fandom, the site for fan-generated wiki communities around movies, books, games, and more. Most of this content is fun and wonderful, but Fandom does need to remove objectionable content that violates their terms of service. Relying on humans to review every single image took hours, and the company could barely keep up. Using generative AI tools for dynamic tagging, the company reduced the time teams spent on manual labeling by 85%. The moderation lag time for new images dropped from about four hours to 250 milliseconds. REWRITING THE DNA OF DIGITAL MEDIA