Academic

Calibrating a Sony EVI-D30

The camera is a Sony EVI-D30 with a Kenko 0.42X Wide Conversion lens. The camera provides serial control of pan, tilt, and zoom. For the purposes of the foveation feature of the live video page we want to be able to orient the camera such that the selected object is at the center of the frame. This allows us to zoom the camera and obtain a tight shot of the object of interest.

The task is to create a mapping between image coordinates (x,y) and pan-tilt settings (P,T), so that given an image pair (x,y) was can drive the camera to the configuration (P,T) and obtain an image where the image patch previously at (x,y) is now at (320,240), the center of the frame.

Gathering Data

The first experiment was to gather data about this mapping to determine it's character. This could have been done by hand in this way:

  1. Pick an object in the image.
  2. Manually pan and tilt the camera to center that object.
  3. Query the camera and record the resulting (P,T).
Gathering enough data to characterize the mapping would have been very time consuming, so the procedure was automated.

Step 2 was automated using normalized correlation tracking. A location was picked in step 1 and the image patch around that location, I, was saved. The camera was moved a small amount and a new image patch, J, was saved. Normalized correlation was used to determine the position in J that most resembled the original patch I. If this location was closer to the center of the image then it became the new target. The process continued until the target was at the center of the image (a success), or an iteration bound is reached (a failure).

A small optimization of this procedure is to start off with an approximately correct mapping. If small steps are taken then the inaccuracies in this "initial guess" mapping won't harm us, and the algorithm will eventually converge. The foveation code is in this shell script, and here is the main loop of normcorr.

Step 1 was automated by picking (x,y) values at random. This means that the computer may choose to foveate regions where there is nothing to track. This will result in more failures, but that just means that we'll have to let the process run longer.

Analyzing the Data

The following image shows the data gathered by above method over the course of a weekend. Green rectangles represent good data. Yellow rectangles represent failures identified by the tracker. Red rectangles represent mistakes (supposedly valid tracker output that is in error).

The rectangles do not extend to the top or bottom of the image due to limitations in the tilt range of the EVI-D30. Note that some things might have moved during the experiment: people, chairs, desktop clutter.

By plotting output parameters as a function of input parameters we can get an idea what kind of mapping we're going to have to approximate by the shape of the resulting plots. The plot below show the surprising result that P is a linear function of x. Similarly T seems to be a linear function of y:

This is surprising because this implies that the non-linear distortions caused by rotating the camera precisely match the non-linear distortions introduced by the extreme-wide angle view. The following plots show that pan is independent of y and tilt is independent of x. This is less surprising since it means that the lens is symmetric and the CCD is very well aligned with the axes of the pan-tilt articulation:

Solving for the Parameters

Since the mapping is linear it looks like this:

If we expand the equations a bit by adding in zeros to create some symmetry we get this system of equations:

Which can be rewritten as the following matrix equation:
where the dots represent "stacked up" image locations on the left and the corresponding pan and tilt on the right for several points in the image. This represents the data corresponding to green rectangles in the above experiment. The vector (a b c d) is the set of parameters that we are attempting to estimate form the data.

The above equation is a simple linear matrix equation. There are many ways to solve such systems, but since we don't expect numerical stability to be a problem, we can safely use the pseudo-inverse method:

The solution is then

and we can write this perl script that foveates the camera on a given image coordinate.

Try it out!

Note: I've recently changed offices. Because the camera is now in a slightly more zoomed state by default, I had to re-calibrate the camera. Because I knew that the model was linear I only needed to collect a few points by hand an resolve for lambda.


Go Up a Level
"Christopher R. Wren" <wren@media.mit.edu>
$Id: calibration.html,v 1.5 1999/11/15 13:52:58 cwren Exp $

©1998 Christopher R. Wren.
Use of this material without prior written consent is strictly forbidden.