SPROCKET attentional trace SPROCKET

SPROCKET is a computer vision system that looks at and understands geared machines. Understanding means being able to explain how the machine "makes sense"--how the axles and gears are arranged to turn each other in some useful manner, and how the frame holds it all together. As SPROCKET inspects a machine, it poses questions such as "What would this gear have to connect to in order to serve some sensible function?" or "What could keep this axle from falling out of the machine?" These questions drive visual exploration. Their answers form the backbone of SPROCKET's final understanding of the machine.

This page contains simple and complex examples of SPROCKET at work, a note on its ancestry, and some possible applications of this technology. You can also download an early article on SPROCKET.


simple gear 
assembly A simple example

Here is a movie of SPROCKET exploring the very simple machine pictured above. In the movie, highlighted areas indicate regions of interest that SPROCKET will look for parts in, and the trace shows the actual path of visual focus of attention. If the movie flashes by too quickly, here is an annotated pictorial trace of how the gearbox was figured out. Of course, the point of all this is to get an explanation of the scene.

subtle gear 
assembly A complex example

This is a fairly complex gearbox with a subtle design and lots of opportunities for error. SPROCKET has to contend with the fact that much of the important structure of this machine is invisible or shadowed. (How many axles are there?) Moreover, it's almost impossible to properly distinguish where one gear ends and another begins. Here's how a slice of the machine looks to SPROCKET's gear-segmenting visual routine:
subsection of gearbox image
Can you tell how many gears are in this strip? Or where they begin and end?

Of course subsection we humans largest subsection use context to help us with problems like these. SPROCKET does too, by trying to figure out how various ways of divvying up the strip of texture into gears (and bits of frame) would combine with other known parts of the machine to make a sensible drivetrain. Here is part of a trace of SPROCKET puzzling out exactly this problem.

BUSTER 
attentional trace Ancestry

SPROCKET is an outgrowth of BUSTER, an vision system that looks at and understands towers made of children's blocks. BUSTER tries to show how blocks towers carry weight to the ground without falling down; SPROCKET tries to show how geared machines carry motion through their drivetrain without falling apart. Both use basic knowledge about how the physical world works to help them explore a scene and make sense of what they see. In the case of BUSTER, that basic knowledge covers substantiality, rigidity, weight, and balance. In the case of SPROCKET, that knowledge is extended to cover attachment, friction, containment. Many educational toys are aimed at helping children master these principles, especially in terms of hand-eye coordination. This makes sense because these basic aspects of physical causality account for much of how the physical world looks and how it responds to our manipulations. Naturally, this seems to be a good place to start building artificial intelligences.

Applications

Some potential elaborations of SPROCKET: Reporters like the robot mechanic angle: SPROCKET has been in the news.
Back to my recent work page.

Matthew Brand / MIT Media Lab / brand@media.mit.edu