New research from Google's UK-based DeepMind subsidiary demonstrates that deep neural networks have a remarkable capacity to understand a scene, represent it in a compact format, and then "imagine" what the same scene would look like from a perspective the network hasn't seen before.
Human beings are good at this. If shown a picture of a table with only the front three legs visible, most people know intuitively that the table probably has a fourth leg on the opposite side and that the wall behind the table is probably the same color as the parts they can see. With practice, we can learn to sketch the scene from another angle, taking into account perspective, shadow, and other visual effects.
A DeepMind team led by Ali Eslami and Danilo Rezende has
developed software based on deep neural networks with these same capabilities—at least for simplified geometric scenes. Given a handful of "snapshots" of a virtual scene, the software—known as a generative query network (GQN)—uses a neural network to build a compact mathematical representation of that scene. It then uses that representation to render images of the room from new perspectives—perspectives the network hasn't seen before.
The researchers didn't hard-code any prior knowledge about the kind of environments they would be rendering into the GQN. Human beings are aided by years of experience looking at real-world objects. The DeepMind network develops its own similar intuition simply by examining a bunch of images from similar scenes.
"One of the most surprising results [was] when we saw it could do things like perspective and occlusion and lighting and shadows," Eslami told us in a Wednesday phone interview. "We know how to write renderers and graphics engines," he said. What's remarkable about DeepMind's software, however, is that the programmers
didn't try to hard-code these laws of physics into the software. Instead, Eslami said, the software started with a blank slate that was able to "effectively discover these rules by looking at images."
It's the latest demonstration of the incredible versatility of deep neural networks. We already know how to use deep learning to
classify images,
win at Go, and even
play Atari 2600 games. Now we know they have a remarkable capacity for reasoning about three-dimensional spaces.
snip....