Big Brother surveillance technology just got a whole lot more clever, thanks to scientists from UCLA. They've developed a camera system that automatically generates a live text description of what it's seeing for better search-engine monitoring.
The system is dubbed I2T, for "image to text", and it's a collection of extremely clever computer vision algorithms that analyze the ongoing video stream from a typical surveillance camera that you may see watching a store or a busy road intersection in a city.
The core of I2T is a vast database of images and objects that the algorithms consult when they're trying to recognize objects in the video scenes--there are over two million images covering 500 categories of object. I2T grabs a video frame, works out what is background information and ignores it, then tries to recognize objects in the scene, before spitting out a semi-natural language description of what's going on. For example, it's smart enough to detect an object moving from one scene to another and can report a car jumping a red light at a traffic stop. It can even remember if a particular object leaves the scene and returns, which may have potential in attempting to spot activities like criminals casing a location before attempting a crime.
The clever part is the text output, of course. Surveillance footage typically requires a pair of human eyes to monitor it to watch for what's going on, as machine's aren't that good at this task yet. And searching through a vast array of video footage for a particular event usually requires some chump doing so manually. Being able to search for keywords in a text archive is a much simpler way to access the relevant moments in a camera surveillance history.
I2T's database may be vast, but it's not large enough (and the system's not quite intelligent enough yet) to be extremely useful in a real-life situation. The technology does point to the future though, where super-smart surveillance cams can self-analyze what's going on, and spit out natural language descriptions of the events in real time, which can then be accessed through a regular search engine system like Google. This may even have applications in less serious implementations like YouTube videos, where Google's already experimenting with automatic speech transcription technology for subtitles.