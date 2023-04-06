Meta has developed a new AI model called Segment Anything that can cut any object out of any digital image or video, even if it’s never seen the object or the image before. The research could have big implications for Meta’s metaverse (if it shows up) as well as its core ads business.

The technology looks similar to that used in the iPhone’s photo app to remove the background from images of people or things. But Meta’s model is probably more powerful and certainly more versatile. The image set used to train Segment Anything is said to be 400 times larger than the next largest of its kind. (Meta made two training data sets available as open source.) Using a massive image dataset, Meta researchers taught the model methods of identifying the pixels that make up objects, vs. recognizing specific objects themselves. So the model can recognize any object—from cancer cells to undersea creatures—regardless of context. That means that third-parties (and Meta itself) can put the foundation model to use without first bearing the expense of further training the model on specific, labeled images. The company released a research paper on the model, and also made a tool so that people could try the AI using their own images.

Generating a metaverse? The company has bet billions that the metaverse will be a popular place where people socialize, work, and play in the future. And the Segment Anything AI could find its most interesting applications in virtual and augmented reality glasses, which Meta’s Reality Labs group is developing as a primary access point to the virtual 3D world it calls the metaverse. Meta has already built eye tracking sensors (into its Quest Pro VR headset, for example) that can detect objects that a user sees. The Segment Anything model can be used to isolate such an object from its environment, identify it, and convert it into 3D digital content. “In the AR/VR domain, SAM (Segment Anything Model) could enable selecting an object based on a user’s gaze and then ‘lifting’ it into 3D,” Meta says in blog post. The technology could be especially valuable because, as Meta CTO Andrew Bosworth told Nikkei, generating the immersive content that surrounds the user and reflects their interests and tastes could be a very expensive proposition. “In the future you might be able to just describe the world you want to create and have the large language model generate that world for you,” Bosworth said. “And so it makes thing like content creation much more accessible to more people.”

Expand to continue reading ↓