Papers describing these projects can be found in my list of publications.
![]() Front view
|
![]() Side view
|
3-D audio systems, which can surround a listener with sounds at arbitrary locations, are an important part of immersive interfaces. A new approach is presented for implementing 3-D audio using a pair of conventional loudspeakers. The new idea is to use the tracked position of the listener's head to optimize the acoustical presentation, and thus produce a much more realistic illusion over a larger listening area than existing loudspeaker 3-D audio systems. By using a remote head tracker, for instance based on computer vision, an immersive audio environment can be created without donning headphones or other equipment.
The general approach to a 3-D audio system is to reconstruct the acoustic pressures at the listener's ears that would result from the natural listening situation to be simulated. To accomplish this using loudspeakers requires that first, the ear signals corresponding to the target scene are synthesized by appropriately encoding directional cues, a process known as "binaural synthesis," and second, these signals are delivered to the listener by inverting the transmission paths that exist from the speakers to the listener, a process known as "crosstalk cancellation." Existing crosstalk cancellation systems only function at a fixed listening location; when the listener moves away from the equalization zone, the 3-D illusion is lost. Steering the equalization zone to the tracked listener preserves the 3-D illusion over a large listening volume, thus simulating a reconstructed soundfield, and also provides dynamic localization cues by maintaining stationary external sound sources during head motion.
The dissertation discusses the theory, implementation, and testing of a head-tracked loudspeaker 3-D audio system. Crosstalk cancellers that can be steered to the location of a tracked listener are described. The objective performance of these systems has been evaluated using simulations and acoustical measurements made at the ears of human subjects. Many sound localization experiments were also conducted; the results show that head-tracking both significantly improves localization when the listener is displaced from the ideal listening location, and also enables dynamic localization cues.
The dissertation has been published by Kluwer Academic Publishers:
Gardner, W. G. (1998). 3-D Audio Using Loudspeakers. Kluwer Academic Publishers, Norwell, MA. ISBN 0-7923-8156-4.
Gardner, W. G. (1997). Head-Tracked 3-D Audio Using Loudspeakers. Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY.
For information on visual head tracking at the Media Lab, see Sumit Basu's work on model-based head tracking and Nuria Oliver's work on LAFTER: Lips and Face Real-Time Tracker.
(46K)A headphone-based, 3-D auditory display using HRTFs measured from a KEMAR.
An extensive set of head-related transfer function (HRTF) measurements of a Knowles Electronic Mannequin for Acoustic Research (KEMAR) has been made. The measurements consist of the left and right ear impulse responses from a Realistic Optimus Pro 7 loudspeaker mounted 1.4 meters from the KEMAR. Maximum length (ML) pseudo-random binary sequences were used to obtain the impulse responses at a sampling rate of 44.1 kHz. In total, 710 different positions were sampled at elevations from -40 degrees to +90 degrees.
This data has been used to implement a realtime 3-D spatialization system which runs on an SGI Indigo computer. The system allows a single monophonic source to be positioned arbitrarily around the head of a listener wearing headphones. The system uses two 128 point convolvers for the left and right channels and runs at a 32 kHz sampling rate. Control of source elevation, azimuth, and distance is achieved using a MIDI controller.
Gardner, W. G., and Martin, K. D. (1994). HRTF measurements of a KEMAR dummy head microphone. MIT Media Lab Perceptual Computing Technical Report #280. Included on the CD-ROM "Standards in Computer Generated Music", Goffredo Haus and Isabella Pighi, eds., published by the IEEE CS Technical Committee on Computer Generated Music, 1996.
The HRTF data, 3-D spatialization software, and related information can be accessed via the KEMAR HRTF page.

A study of reverberation algorithms resulting in a book chapter on the subject.
This chapter discusses reverberation algorithms, with emphasis on algorithms that can be implemented for realtime performance. The chapter begins with a concise framework describing the physics and perception of reverberation. This includes a discussion of geometrical, modal, and statistical models for reverberation, the perceptual effects of reverberation, and subjective and objective measures of reverberation. Algorithms for simulating early reverberation are discussed first, followed by a discussion of algorithms that simulate late, diffuse reverberation. This latter material is presented in chronological order, starting with reverberators based on comb and allpass filters, then discussing allpass feedback loops, and proceeding to recent designs based on inserting absorptive losses into a lossless prototype implemented using feedback delay networks or digital waveguide networks.
This book is now available; for more information, see the publisher's page for the book or the editor's page for the book.

A block convolution algorithm that runs without latency by partitioning the filter response into blocks of exponentially increasing size.
This paper won the 1997 AES Publications Award for the outstanding paper published in the Journal of the Audio Engineering Society during the two preceding years.
Gardner, W. G. (1994). Efficient convolution without input-output delay. Presented at the 97th convention of the Audio Engineering Society, San Francisco. Preprint 3897.

Reverberation level matching experiments that determine how the loudness of running reverberation depends on the source signal and the reverberant response.
One set of experiments matches tone sequences with different on/off duty cycles using the same reverberation time and decay shape. The results indicate that the perception of reverberation level is highly dependent on the quiet gaps present in the signal, as predicted by loudness masking. Dependence on melody is also significant, but less easily explained.
Another set of experiments matches different reverberation times and decay shapes using the same musical input signal. At a reference reverberation level of -20 dB, a typical solo piece may require 10 dB more reverberation at RT = 0.5 sec to sound as reverberant as the same piece at RT = 2.0 sec.
(61K)A surround sound system that simulates room acoustics. This was the topic of my Master's thesis.
Gardner, W. G. (1992). The virtual acoustic room. Master's thesis, MIT Media Lab.
The virtual acoustic room was implemented using the Reverb application described below.

A reverberator design tool.
Reverb is a Macintosh application that allows the user to construct stereo delays, echo effects, chorusers, and reverberators using a simple programming language. The resulting audio effects can be applied to soundfiles, or can be run in realtime on a Digidesign Audiomedia card. When running in realtime, effects parameters can be adjusted via external MIDI control.
The program is available via FTP, and comes with a manual and sample effects programs. The program is very out of date at this point, though it is still functional under certain conditions. Be sure to read the release notes accompanying the program.
Back to sound.media.mit.edu root