Depth from focus/defocus

Author: Paolo Favaro
Date: June 25th, 2002
e-mail: fava@ee.wustl.edu
For additional information see my web page.

Depth from focus/defocus is the problem of estimating the 3D surface of a scene from a set of two or more images of that scene. The images are obtained by changing the camera parameters (typically the focal setting or the image plane axial position), and taken from the same point of view (see Figure below).

Figure 1: Simplified geometry of a real aperture camera.

The difference between depth from focus and depth from defocus is that, in the first case it is possible to dynamically change the camera parameters during the surface estimation process, while in the second case this is not allowed(see [1-12] as a sample of the literature on depth from focus/defocus).

In addition, both the problems are called either active or passive depth from focus/defocus, depending on whether it is possible or not to project a structured light onto the scene.

While many computer vision techniques estimate 3D surfaces by using images obtained with pin-hole cameras, in depth from defocus we use real aperture cameras. Real aperture cameras have a short depth of field, resulting in images which appear focused only on a small 3D slice of the scene. The image process formation can be explained with optical geometry. The lens is modeled via the thin lens law, i.e. [ 1/f]=[ 1/v]+[ 1/u], where f is the focal length, u is the distance between the lens plane and the plane in focus in the scene, and v is the distance between the lens plane and the image plane.

A scene is typically modeled as a smooth opaque Lambertian (i.e. with constant bidirectional reflectance distribution function) surface s. Attached to the surface we have a texture r (otherwise called radiance or focused image).

In this case, the intensity I(y) at a pixel y Î Z² (we denote vector coordinates with boldface fonts) of the CCD surface can be described by:

I(y) = ó
õ h_u(y,x,s(x)) r(x) dx
(1)

where the kernel h depends on the surface s and the optical settings u, and x Î R². For a fixed surface s(x) = d (i.e. a plane parallel to the lens plane at distance d from the lens plane), the kernel h is function of the difference y-x, i.e. integral 1 becomes the convolution

I(y) = (h_u,d*r)(y).
(2)

More in general, the kernel h determines the amount of blurring that affects a specific area of the surface in the scene. With ideal optics, the kernel can be represented by a pillbox function. However, in many algorithms for depth from defocus the kernel is approximated by a Gaussian (see Figure below)

h_u(y,x,s(x)) = 1
ps²(s(x),u)
exp æ
è - (y-x)^T(y-x)
2s²(s(x),u)
ö
ø
(3)
where s²(s(x),u) is called the blurring radius, and it depends on the surface s and the focal setting u.

Figure 2: Example of a 2D Gaussian kernel.

Now, the original statement of the problem of depth from defocus can be stated more precisely. Given a set of L ³ 2 images I₁...I_L obtained with focal settings u₁...u_L from the same scene, we want to reconstruct the surface s of the scene. For some methods, this may also require to reconstruct the radiance r.

In literature there exists a large variety of approximation models for the above equations. The main exploited simplification is the equifocal assumption. The equifocal assumption consists in representing the surface locally with a plane parallel to the image plane (i.e. an equifocal plane). Then, the image formation process can be locally approximated by Eq. 2.

There also exists a number of real-time systems for depth from defocus. Depth from defocus has been proven to be effective for small distances (e.g. microscopy). Depth from defocus has been compared to stereo vision, provided that the optical system and the scene are properly re-scaled.

Here are some examples of defocused images and the corresponding depth reconstructions.

Figure 3: Two images obtained from the same scene but with different focal settings. The image on the left is far focused while the image on the right is near focused.

Figure 4: Estimated depth map using a recent shape from defocus algorithm (joint work with H.Jin). On the left the depth map is rendered as a 3D surface. On the right the depth map is rendered as a gray-level image. Dark intensities correspond to distant points, light intensities correspond to close points.

Figure 5: Reconstructed radiance r. On the top left it is shown the reconstructed radiance r, while on the top right and in the bottom the reconstructed radiance is texture mapped to the reconstructed surface to allow visualization from novel points of view.

References

[1]: S. Chaudhuri and A. Rajagopalan. Depth from defocus: a real aperture imaging approach, Springer Verlag, 1999.
[2]: J. Ens and P. Lawrence. An investigation of methods for determining depth from focus. IEEE Trans. Pattern Anal. Mach. Intell., 15:97-108, 1993.
[3]: P. Favaro and S. Soatto. Shape and radiance estimation from the information divergence of blurred images. In Proc. European Conference on Computer Vision, 1:755-68, June/July 2000.
[4]: P. Favaro and S. Soatto. Learning depth from defocus. Proc. IEEE European Conference on Computer Vision, 2002 (in press).
[5]: H. Jin and P. Favaro. A variational approach to shape from defocus. Proc. IEEE European Conference on Computer Vision, 2002 (in press).
[6]: A. Mennucci and S. Soatto. The accommodation cue, part 1: modeling. Essrl technical report 99-001, Washington University, October 1999.
[7]: S. Nayar and Y. Nakagawa. Shape from focus. IEEE Trans. Pattern Anal. Mach. Intell., 16(8):824-831, 1994.
[8]: A. Pentland. A new sense for depth of field. IEEE Trans. Pattern Anal. Mach. Intell., 9:523-531, 1987.
[9]: S. Soatto and P. Favaro. A geometric approach to blind deconvolution with application to shape from defocus. Proc. IEEE Computer Vision and Pattern Recognition, 2:10-7, 2000.
[10]: M. Subbarao and G. Surya. Depth from defocus: a spatial domain approach. Intl. J. of Computer Vision, 13:271-294, 1994.
[11]: M. Watanabe and S. Nayar. Rational filters for passive depth from defocus. Intl. J. of Comp. Vision, 27(3):203-225, 1998.
[12]: Y. Xiong and S. Shafer. Depth from focusing and defocusing. In Proc. of the Intl. Conf. of Comp. Vision and Pat. Recogn., pages 68-73, 1993.

File translated from T_EX by T_TH, version 3.08.
On 25 Jun 2002, 16:45.