In an effort to beat Google at mapping, Microsoft Bing will use crowd-sourced photos to create a 3-D virtual worlds in its Maps application, the company has told FastCompany.com. The 3-D models will eventually be knitted into Bing Maps’ existing aerial and street-views and will allow users to explore and zoom at a level of detail that Google and Yahoo Maps can’t presently match. An exclusive screenshot of the new technology, which goes live Wednesday, shown above.
How It Works
The secret sauce behind this new iteration of Bing Maps is Microsoft Photosynth, software that analyzes digital pictures and generates a three-dimensional model of the photographed area, as well as a “point cloud” that helps the system integrate new images. Think of a Photosynth (or “synth,” in Redmond parlance) as one of those 3-D video tours, except with dozens or hundreds of cameras contributing detail at every level. The effect is mind-blowing. Check it out below.
“We’re not just relying on images from cameras flown from above, or driven around on top of cars, to get images of a street,” says Blaise Aguera y Arcas, Microsoft Bing’s head software architect, though Bing is also collecting the kind of “streetside” data made famous by Google Street View, and has already mapped some 50 cities that way. “We think of all that ‘street-side’ data as just the visual trellis on top of which all this [Photosynth] stuff grows, like grapes on a vine.”
So why do we need 3-D maps? Well, for simple things–driving directions, finding a business–Bing Maps already gives you a nice light AJAX experience. “But obviously things are moving to a much more expansive, immersive direction,” says Aguera y Arcas. “If you want to explore, if you want to really understand more about a place, you really need to be able to get right down in there, and see if from the point of view that people actually experience it. As great as it is to use cameras on top of cars for building that visual trellis, that’s not the actual human perspective.” Using Photosynth in Bing Maps, he says, is about bridging that connection between maps and photo-sharing sites like Flickr.
The Evolution of Bing in 3-D
Photosynth has been around as a stand-alone app since last summer, when the Microsoft Live Labs team applied it to Flickr-tagged images to make prototype models of popular toursit spots like Notre Dame Cathedral in Paris. The technology began as “Photo Tourism,” a joint project between Live Labs and the University of Washington. Part of the magic of Photosynth, says Aguera y Arcas, is that it can draw photos from all different kinds of cameras. “Those ‘grapes’ on the visual trellis, so to speak, don’t have to be of uniform quality,” he says. “We have users making Photosynths using everything from cell phone photos to digital SLR images.”
To see how it works, check out this TED video of Aguera y Arcas discussing Photosynth in 2007.
Google, Bing’s behemoth competitor, has adopted a different strategy in the quest for 3-D mapping. Last month it announced it will use a browser plug-in called Google Building Maker to allow users to build virtual 3-D models of cities and towns. Google is also collecting all its own street-side data for Street View, and recently ditched TeleAtlas as its map data provider in an attempt to build its own map database from scratch. (A screenshot of Google’s 3-D maps, below.)
This isn’t the first upgrade Bing Maps has seen this quarter. Microsoft has been quietly improving Maps for several months, adding things like draggable routes and embedding. Most of those improvements have been drowned out by the buzz around Bing’s recent flirtations with News Corp.
How Synths Work
This is a Photosynth. This one is of Rome, Italy, but users can make Photosynths of any place on earth and geo-tag them. With the changes Microsoft is launching Wednesday, those geo-tagged Photosynths will now appear in Bing Maps, where they’ll be primed for integration with Bing’s street-side and aerial data.
Photosynth began as an ActiveX plugin for IE and Firefox, but its team moved to using Microsoft Silverlight to allow the technology to scale. “When you make a Photosynth, two things have to happen,” says Aguera y Arcas. “First you upload a batch of photos. Then you have to do all the hard computer-vision work to synthesize all those photos into a 3-D reconstruction. We could have taken the approach of doing all these things in giant server farms in eastern Washington,” he says, “but what we did instead is that we built the computation into the software, so your computer is processing the photos as you’re uploading them.” This is a boon in the U.S., where broadband upload speeds aren’t so good at handling large uploads.
Viewing Bing Maps synths will require Microsoft Silverlight, that finicky software your browser uses to play streaming Netflix movies on Mac or PC. “A good immersive experience is very scalable to all sorts of screen sizes and bandwidths,” Aguera y Arcas says, and Silverlight will be the key. “We want this to work on everything from a mobile phone to a home theater–and I don’t think those things are at all contradictory,” he says. The system can scale, he says, because it loads its synth images in multiple resolutions and carefully engineers how the map scales, batches and blends. That zoom strategy is made possible by another Redmond technology called Seadragon, which allows all these images to seem as if they’re loading instantly. (If you watched the TED video, you can see that lag is a non-issue.)
But doesn’t Silverlight sort of, well, suck? “The performance that is coming in our first outing is not where I want it to be,” says Aguera y Arcas, “but we didn’t want to delay. We really believe in iterating. But there’s nothing inherently about the Silverlight approach that makes it lower performance. Using a bunch of techniques, we’ll be able to get a better performance experience–even faster than AJAX. We just have to keep hammering at it,” he says. And all the talk of scaling across devices isn’t Microsoft being fatuous; the company just revealed Silverlight will work on the iPhone as well as the desktop.
How Synths Attach to the “Visual Trellis”
Users have been able to geo-locate their synths for months now: sometimes it’s GPS tagged data in photo, sometimes it’s manually located by dragging a pushpin around on a map. “But when it’s a synth, it’s not just a point, it’s a whole 3-D structure,” says Aguera y Arcas, so the push-pin approach is too crude. “Once you’re done with the push-pin, you’ll eventually be able to see the point-cloud from above, so you can drag your synth around and scale it to fit into the area by visually fitting it with the aerial image.” (Below, Bing Maps will allow users to position their synths on the map grid, and eventually meld them with street-view and aerial data.)
Attaching Users to Bing Maps
Building a Photosynth is a real project for users, so enfranchising them will be key to getting Bing Maps built out quickly. “We have a lot of work to do in this area–we’re just in the beginnings of [winning users],” says Aguera y Arcas. Eventually, synths will be “ranked,” encouraging higher-quality submissions. “There are lots of ways we can rank them, just as [a search engine] ranks Web pages,” he says. Flickr employs a similar system; users rank images by “interestingness.” One factor, Aguera y Arcas says, will be “synthiness”–or how well a batch of photos overlap. Those (and other) ratings will be also be left to the crowd to determine.
Obstacles remain. Any system that uses UGC is always vulnerable to gaming, though Bing Maps should prove unusually resilient because its system analyzes the image itself to determine where on the visual trellis it belongs. “What it means to make a ‘spammy’ Photosynth is much different than spoofing an address or phone number,” Aguera y Arcas says; the system will only knit in images that match the rest of the scene. Still, he acknowledges there may be unforeseen ways to vandalize the new feature. “We’re just starting to make the connections between Photosynth and Maps right now, so it’s possible we’ll see some creative negative things happen as a result.”
What’s next? Semantics
Being 3-D isn’t enough, says Aguera y Arcas; synths inside Bing Maps should known what they’re showing, and react accordingly. Someday, he says, “the system will be able to draw information out of the pixels” and provide relevant data, recognizing things like store-fronts and parks. “Whenver dumb pixels are linked with smart pixels, information should flow back into the dumb pixels,” he says. In other words, the street-side map already knows where all the roads and shops are. Photosynths will be able to graft all that data once they’re fit onto the trellis. “It should even know which way the camera was pointed,” says Aguera y Arcas.
Microsoft wouldn’t comment on mobile viewing or video synths, but Aguera y Arcas intimated those things were indeed in the roadmap. But those media are hardly the end of the roadmap. “Mapping is not what excites me about this project,” he says. “I’m most excited to get to the point where the map becomes as big as the landscape–that mapping becomes one with augmented reality, telepresence, virtual reality–all those things. The point where the map becomes so thorough, and so human in its scale, that it becomes a mirror-world.”