Four Google Brain research scientists–Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc Le–recently released the results of a fascinating project with two objectives. First, they wanted to create an AI system that can spawn new AI systems, which are more sophisticated than what humans can design. Second, they wanted to test this system to identify objects in real time with remarkable accuracy. They’ve done both–and their research has vast implications for everything from surveillance to self-driving cars.
Designing machine learning models is remarkably tedious. It requires significant time and expertise, so to speed things up, Google’s researchers created AutoML, a machine learning model designed to create other machine learning models using an approach called reinforcement learning. This method has a controller neural network that can create a “child” network to execute a specific task. In this case, the task was to recognize objects in a real-time video feed, like people, cars, traffic lights, handbags, or backpacks. The “child” model trains for the task and gets evaluated by the controller AI, which learns from the feedback and refines the child–a process that gets repeated thousands of times until the child model gets really good.
How good? The Google Brain researchers claim that their machine-designed model beats all state-of-the-art computer vision systems created by people. They applied AutoML to the ImageNet image classification and COCO object detection data set, which according to the Google team are “two of the most respected large-scale academic data sets in computer vision.” This graphic shows the accuracy of their system–called NASNet–in red:
Not only is NASNet vastly more accurate, it’s more efficient than other models. According to the researchers’ article, it halves “the computational cost of the best reported result on arxiv.org (SENet).”
Self-driving cars are one obvious use of this architecture. You can imagine the system helping Google’s AVs identify traffic, pedestrians, and road hazards. NASNet could also be used in augmented reality to help apps interact with the environment in a faster, more accurate way that current computer vision solutions. But perhaps the most intriguing applications have yet to be identified. Google’s researchers decided to make NASNet public here (for image classification) and here (for object detection), so other scientists can make use of it.
Of course, automating automation raises some alarming questions. How do you ensure you aren’t building a biased system that then passes that bias onto another system? How do you ensure the systems are used ethically? I can imagine some dystopian applications, like automated surveillance, in which computers constantly analyze images to flag objects or activities that they consider suspicious. That could be a boon to public safety or it could be the makings of a police state. I can also imagine refining the system to recognize faces on the fly and follow anyone across a city.
NASNet can address “multitudes of computer vision problems we have not yet imagined,” the researchers write. For better or for worse.