------------------------------------------------------------------------------
	      ====/  =====    ==	CALIBRATED IMAGING LABORATORY
	    //	      //     //	       CARNEGIE MELLON UNIVERSITY
	   //	     //     //	      PITTSBURGH, PENNSYLVANIA  15213  USA
	  //	    //     //	     Dr. Steven A. Shafer, Director
	  ====/	  =====   =====     +1 (412) 268-2527  sas@ri.cmu.edu
------------------------------------------------------------------------------

------------------------------------------------------------------------------
		   README file for the CIL STEREO DATASET
------------------------------------------------------------------------------

The current datasets can be retrived by anonymous FTP:

	ftp ftp.cs.cmu.edu [128.2.206.173 in Jan 1994, but may change]
	login: anonymous
	passwd: [your email address]
	cd /usr0/anon/project/cil
	binary
	dir
	get cil-0001.tar
	get cil-0002.tar		[and so on]

by World Wide Web (WWW) clients (e.g., NCSA Mosaic):

	http://www.cs.cmu.edu:8001/usr0/anon/project/cil/html/cil-ster.html

by remote filesystem (AFS or Alex):

	cp /afs/cs.cmu.edu/project/cil/cil-0001.tar .
	cp /alex/edu/cmu/cs/ftp/project/cil/cil-0001.tar .

or by email (but this is your last resort!!):

	% mail ftpmail@decwrl.dec.com
	Subject:  none			[this is ignored]

	reply your-email-address   	[use your INTERNET-style address]
	connect ftp.cs.cmu.edu
	binary
	cd /usr0/anon/project/cil
	get cil-0001.tar
	quit

(Send "help" alone in the message body for more details about the FTPmail
 service)

*****************************************************************************
    FOR MORE INFORMATION ABOUT THIS DATA PLEASE SEND EMAIL TO CIL@CMU.EDU
*****************************************************************************

If Internet email is not a viable option, then please contact Steve Shafer
or Mark Maimone directly.  Here are several ways to contact us:

			<person>
			Computer Science Department
			Carnegie Mellon University
			5000 Forbes Avenue
			Pittsburgh, PA  15213-3891  USA
			FAX: +1 (412) 621 - 1970

	Mark Maimone				Steven A. Shafer
	+1 (412) 268 - 7698			+1 (412) 268 - 2527
	Internet:     mwm@cmu.edu		sas@cs.cmu.edu

	WWW: http://www.cs.cmu.edu:8001/Web/People/mwm/www/HomePage.html

------------------------------------------------------------------------------

------------------------------------------------------------------------------
INTRODUCTION

	This dataset contains multiple images of static scenes with accurate
information about object locations in 3D.  It is being provided by Carnegie
Mellon as a service to the Net community to address the current lack of
stereo image data with ground truth (noted in [1]).

	The images were taken with a scientific camera in an indoor setting,
the Calibrated Imaging Laboratory at CMU.  The types of objects in the
images vary from simple polyhedra to complex model train sets.  Actual 3D
locations are given in X-Y-Z coordinates with a simple text description,
and the corresponding image coordinates are provided for all images.
Eleven images of each scene were taken in an attempt to address the needs of
binocular stereo, multi-baseline stereo and optical flow researchers.

	These data are provided under Contract No. F49620-92-C-0073, ARPA
Order No. 8875.

	The rest of this README file describes the imaging process,
calibration methods and data representations.

					Mark Maimone
					mwm@cmu.edu



------------------------------------------------------------------------------

------------------------------------------------------------------------------
DATA ACQUISITION

CAMERA MOVEMENT

	All images were taken using a single camera, which was moved from
place to place using an automated jig platform.  Object position and
lighting remained constant during the camera motion, and there was no
(significant) camera rotation:  all the optical axes are (nearly) parallel.
The 11 positions from which images were taken are diagrammed below:

			       ___________
			      /		 /|
			     /		/ |  Scene volume
			    /__________/  |
			    |  /       | /
			    | /	       |/
			    |/_________/


			Y
			^
			|     Z
			|    7
			| . /
			|o /		Camera positions
		    O	O .		    Z=0:  O
			|o		      1:  o
		O   O   O---O---O--->X	      2:  .


	Each of these positions is assigned a location number; this number
is used to identify images, camera calibration data, and image coordinates
of ground truth data points.  In the coordinate frame defined in the diagram
above (which is *not* the frame used in the data files), the location
numbers have the following coordinates:

	1.	( 2, 0, 0)		7.	( 0, 0, 2)
	2.	( 1, 0, 0)		8.	( 0, 1, 2)
	3.	( 0, 0, 0)		9.	( 0, 1, 1)
	4.	(-1, 0, 0)	       10.	( 0, 1, 0)
	5.	(-2, 0, 0)	       11.	(-1, 1, 0)
	6.	( 0, 0, 1)


CAMERA CALIBRATION

	We employ a variation of Roger Tsai's camera calibration (see [2])
to compute the mathematical transformation between image and world
coordinates.  With properly calibrated equipment, we can achieve 0.1-pixel
accuracy (i.e., the Mean Image Error between user-input and model-calculated
3D->2D point transforms is 0.1 pixels).

	A planar target was used for the camera calibration.  The target
itself has a 10x16 grid of black dots over a white background, with the dots
spaced approximately 1 inch apart.  It was imaged in three different
locations (front, middle, back), effectively "sweeping out" the entire
volume occupied by objects in the scene.  When it appears in the images, it
is located in the same "back" position used for the calibration.  These
datasets contain only the results of the calibration (11 parameters and a
few hundred 3D<->2D point correspondences), not the raw calibration images
themselves.

	A monochromatic filter was used to eliminate potential problems with
chromatic aberration.  For these first two datasets, a red filter was used.

	Our lens and CCD array are not quite properly aligned.
Specifically, the lens focusses light in a circular region that is smaller
in diameter than the CCD is wide.  Hence there are vignetting (dark shading)
effects around the image borders.  We've left the data as-is, to allow you
to decide the best way to crop the circular image to fit into a rectangular
window.

------------------------------------------------------------------------------

------------------------------------------------------------------------------
CAMERA MODEL

	We employed Tsai's camera calibration technique for this dataset.
His method (described in detail in [2]) gives us the following camera
parameters:

	EXTRINSIC PARAMETERS:

		- Translation vector T			   [3 params]
		- Rotation matrix R [3x3 matrix, encoded in 3 params]

	INTRINSIC PARAMETERS:
		- Horizontal Scale Correction factor S_x   [1 param]
		- Pinhole Camera Effective Focal length f  [1 param]
		- Radial Lens Distortion coefficient k_1   [1 param]
		- Image Center				   [2 params]

The Extrinsic Parameters serve to locate the camera in three-space, relative
to some arbitrary origin in the scene (defined below).
------------------------------------------------------------------------------

------------------------------------------------------------------------------
FILENAMES

	Each dataset consists of a collection of uniquely-named files.
These are divided into three subdirectories for convenience:

	images/	    - Contains the actual images (e.g., c-000101.gif)
	calib/	    - Contains the camera calibration parameters and data
		      points (e.g., c-000101.par, c-000101.dat)
	calib/code/ - Contains source code for computing the World <-> Image
		      Coordinate transforms
	points/	    - Contains information about ground truth, i.e., known 3D
		      coordinates of points in the scene (e.g.,
		      c-0001points.xyz, c-000101points.txt, c-000101.gt)

The data files themselves contain the following types of information ("DDDD"
denotes the dataset number, "LL" the location number):

	c-DDDDLL.gif	- Compuserve GIF-format image (8 bit greyscale with
			  linear color map)
	c-DDDDicon.gif	- Thumbnail sketch of the center image (number 03),
			  in Compuserve GIF-format
	c-DDDDgt.gif	- Center image with the ground truth points
			  highlighted.
	c-DDDDLL.dat	- List of calibration points in text-format 5-tuples:
			  X,Y,Z coordinates followed by image coordinates
	c-DDDDLL.par	- Camera calibration parameters, computed by applying
			  Tsai's camera calibration to the points in the
			  corresponding ".dat" file.
	c-DDDDpoints.txt- Text description of the locations of all of the
			  ground truth points (e.g., "14. Top of the church
			  steeple")
	c-DDDDpoints.xyz- World Coordinates of the ground truth points,
			  calculated (independently of the images) using
			  surveyor's theodolites
	c-DDDDLL.gt	- Image coordinates of the ground truth data
			  described in the "points" files
------------------------------------------------------------------------------

------------------------------------------------------------------------------
DATA REPRESENTATION

IMAGES

	The images are provided in Compuserve GIF format, a standard
lossless image format.  All images are 8-bit greyscale (hence the color map
is a simple linear ramp), and may appear "dark" if your display's gamma
factor differs from that of the image.  You can make them look brighter by
adjusting your gamma correction factor (replace the default "1" with "2.2",
for example); please see your display tool's documentation for specific
instructions (e.g., using the XLI tool you would simply hit the numeral "2"
to brighten the image).

WORLD COORDINATES

	Several files in this dataset express data in world coordinates,
i.e., X-Y-Z values.  All such values are expressed in millimeters (mm)
relative to a somewhat arbitrary origin.  We define the blank dot in the
center of the target grid (in its position *nearest* the camera) to be
location (20", 20", 0") [yes, that's 20 *inches*; sorry] in world
coordinates, with the X-Y plane containing the (effectively planar) target
grid.  The X axis runs positive from left to right, the Y axis runs positive
from top to bottom, and the Z axis positive from the camera to the scene.
Pictorially:


	   7 Z
	  /
	 /
	O--------> X	       ____________
	|		      /		  /|
	|		     /		 / |  Scene volume
	|		    /___________/  |
	v Y		    |  /        | /
			    | /	  O     |/
			    |/_____\____/
				    \
				     \ (20", 20", 0")

All data 3D data values (in the camera calibration data files and ground
truth point locations) are given relative to this coordinate frame.

IMAGE COORDINATES

	Several files in the dataset express data in image coordinates,
i.e., X-Y pairs.  All such values are expressed in pixels in X-Y format
(*not* row-column), with the origin in the upper left corner of the image.
Pictorially:


		 _______________________
		| -----> X		|
		| | 			|
		| |			|
		| |			|
		| v Y			|
		|			|
		|_______________________|

------------------------------------------------------------------------------

------------------------------------------------------------------------------
FUTURE WORK

	The CIL-0001 and CIL-0002 datasets represent our first attempt at
providing Stereo data with ground truth to the community at large.  Although
our method has the potential to yield 3D<->2D mappings accurate to 1/10th of
a pixel, errors in this first dataset are sometimes as high as 2 pixels, due
to poor instrument calibration and poor post-processing.  These difficulties
will be resolved in future datasets.

	We are open to suggestions and requests!  We cannot guarantee to
address all concerns, but please let us know if you find this data useful,
and what similar types of data you would like to see in the future.  Here
are some of the suggestions we've received so far:

* Can you add FACES to the database?

	Not likely; since we currently use a single camera, it's impossible
	to have a person "freeze" long enough to acquire all 11 images.

* ... add PLANTS to the database?

	Very possible.

* ... add OUTDOOR SCENES?

	Highly unlikely; our equipment is not mobile.  We can always image
	*models* of outdoor scenes, though, if we have such models.

* ... view the same objects at different resolutions?

	Very possible; which objects?  What resolutions?

* ... supply a DENSE DEPTH MAP with all objects

	This is impractical; if we could do it that easily, we wouldn't need
	stereo!  Instead, the list of "approximately coplanar points" allows
	you to construct a PIECEWISE-DENSE DEPTH MAP, albeit with lots of
	gaping holes.

* ... use a RANGE SENSOR to give a dense depth map?

	Unlikely; range sensing dosen't provide true ground truth; it might
	be *nice*, but at the moment this isn't a range sensing database.

* ... provide all the RAW DATA?

	Very possible; send email to cil@cmu.edu.

* Can I use your lab to take pictures?

	Possibly; please contact Dr. Shafer directly for details.

------------------------------------------------------------------------------

------------------------------------------------------------------------------
REFERENCES

[1] ``The JISCT Stereo Evaluation'' by R. C. Bolles, H. H. Baker, and M. J.
Hannah, in April 1993 ARPA Image Understanding Workshop Porceedings, pp.
263-274.  Sample images and results are available by anonymous FTP from
ftp.teleos.com [131.119.250.108 in Dec 1993] in directory
/VISION-LIST-ARCHIVE/IMAGERY/JISCT.

[2] ``A Versatile Camera Calibration Technique for High-Accuracy 3D Machine
Vision Metrology Using Off-the-Shelf TV Cameras and Lenses'', by Roger Tsai,
in the IEEE Journal of Robotics and Automation, August, 1987, RA-3(4), pp.
323-344. An implementation of this code has been provided by Reg Willson
(rgw@ece.cmu.edu) and is available by anonymous FTP to ftp.teleos.com in
directory /VISION-LIST-ARCHIVE/SHAREWARE/CODE/CALIBRATION/TSAI-METHOD.
