next up previous
Next: Model-Based Rendering Up: MainApprRVVS Previous: MainApprRVVS

Image-Based Rendering

As above mentioned, the image-based approach relies on real images taken as reference in place of a geometric 3D model. However, input reference images often do not suffice for the purpose of rendering novel views, so that many of the proposed systems require additional knowledge, such as image-correspondences, depth information, epipolar relations, etc.

The advantage represented by avoiding model reconstruction may also mean software rendering and no exploitation of graphic hardware, (unless the system has been designed for the exploitation of some graphic functions, e.g. projective texture mapping).

Image-based rendering methods often mentioned in current literature, were developed in the half of the nineties, (Chen [11], Levoy and Hanrhan [28], Gortler et al. [21], McMillan and Bishop [33], Seitz and Dyer [40], Chen and Williams [10], Shashua and Werman [43], Chang and Zakhor [8], Leveau and Faugeras [27], Regan and Pose [38]). Research is still very active in the field and new techniques have also been proposed for investigation. For example, the new projection model based on the two-slit camera, (Granum et al. [22]).

In this section summaries of representative works in Image-Based Rendering are presented. The authors name at the top of each summary identifies presented approach together with a "pioneer" reference publication. Figure 1 represents typical computational steps involved in Image-Based Rendering. presented works.

Figure: Image-Based Rendering: typical computational-steps.
\begin{figure}\centerline{
\psfig{figure=figures/IBR_scheme2.eps,height=9cm}
}\end{figure}

Chen [11]

S. E. Chen proposes an image-based rendering system called QuickTimeVR developed by Apple Computer.

The paper presents a way for a computer to systematically deal with movies. It includes an algorithm for storing moving pictures and play them back as fast as possible without using any extra hardware. The system includes a "panoramic movie" technology which enables users to explore spaces, and an "object movie" technology which enables users to examine objects interactively. This system is used on the world wide web to display 3D objects from various viewpoints.

The scene is represented by a set of cylindrical images created at key locations. Based on these images the system is able to synthesize new planar views in response to user input by warping one of these cylindrical images. The user is so able to navigate "discretely" from location to location, and while at each location continuously change the viewing direction. Translation of viewing position can instead only be approximated by selecting reference cylindrical images closest in viewpoint to current viewing position. The above is achieved at interactive rates (greater than 20 frames per second).

The speed of the processor largely determines the quality of visualized movies. If the system can not process all frames in the movie, QuickTimeVR drops some frames. The system can be applied to exchange video on the Internet, virtual navigation of real environment, (useful for architecture planning, museum tours), etc.

Among the advantages: the method represents a practical way of exchanging video on the Internet, virtual navigation of real environment, allows for immersive navigation of visual environment. No need for considering the viewing angle when selecting a reference images, (references are cylindrical), no need for specialized hardware, high-quality images, distortion corrector, multimedia possibilities.

Among the disadvantages: visualized scenes must be static, only playback visualization, several photographs are required and properly registered.

Class of Approaches: no-geometry rendering [44], light-fields [19], mosaicking [26].

Levoy-Hanrahan [28]

This paper describes Light Field Rendering, a simple and robust method for generating new views from arbitrary camera positions without depth information or feature matching, simply by combining and re-sampling the available images.

The major idea behind the technique is a representation of the light field, the radiance, as a function of position and direction, in region free of occluders. In these regions the "light field" is a 4D parameterization of viewing position and direction. An image is a two dimensional slice of the 4D light field. Creating a light field from a set of images corresponds to inserting each 2D slice into a 4D light field representation. Similarly, generating new views corresponds to extracting and re-sampling a slice. Once a light field has been created new views may be constructed in real time by extracting slices in appropriate directions. The desired ray can be looked up in the light field database of rays using the 4D parameterization of viewing position and direction.

Image generation using light fields is inherently a database querying process, much like the movie map image-based approach of Chen [11]. The interpolation scheme used by the authors approximates the re-sampling process by simply interpolating the 4D function from the nearest samples. The authors have investigated the effect of using nearest neighbor, a bilinear interpolation, and full 4D quadrilinear interpolation.

Since the success of the method depends on having a high sample rate, the authors describe a compression system that is able to compress the generated light fields by more than a factor of 100:1 with very little loss of fidelity. A vector quantization scheme is used to reduce the amount of data used in light field rendering, yet achieving random access and selective decoding. The authors have also addressed the issues of anti-aliasing during creation, and re-sampling during slice extraction. In particular, to reduce aliasing effect, the light field is pre-filtered before rendering.

Among the advantages: real-time display of new views by extracting slices in appropriate directions, high freedom in the range of possible views, no model information such as depth-values or image-correspondences is needed to extract the image values, image generation involves only re-sampling (a simple linear process), simple compression schemes can be applied (because of the 3D structure of the light field), re-sampling process simpler than depth or correspondence -based, image-based rendering approaches.

Among the disadvantages: large amount of data that may be required, (but possible high compression by the proposed method), long time for image acquisition (reference images are acquired by scanning a camera along a plane using a motion platform), the flow of light is completely characterizes only through unobstructed space in a static scene with fixed illumination, sampling density must be high (to avoid excessive blurriness).

Class of Approaches: no-geometry rendering [44], light-fields [19], interpolation from dense matching [26].

Gortler-Grzeszczuk-Szeliski-Choen [21]

This paper discusses The Lumigraph, a new computational method for capturing the complete appearance of both synthetic and real world objects and scenes, representing this information, and then using this representation to render images of the object from new camera positions.

The Lumigraph as in the case of light-field rendering is a ray-database query algorithm. The lumigraph uses a 4D parameterization of viewing position and direction, (a 4D parameterization of rays passing through a pair of planes with fixed orientation). The lumigraph, unlike the light field, considers the geometry of the underlying models when reconstructing desired views. The geometric information is used to control the blending of the images. The lumigraph, as well as the light field, is defined as data intensive rendering process. However, the lumigraph can tolerate a lower sampling density since the available geometric information.

Among the advantages: fast scene rendering, arbitrary camera poses are used to construct the database of visible rays, high freedom in the range of possible views.

Among the disadvantages: the preparation of the database requires considerable pre-processing, large amount of data that may be required, the flow of light is completely characterizes only through unobstructed space in a static scene with fixed illumination, sampling density must be high.

Class of Approaches: no-geometry rendering [44], light-fields [19], interpolation from dense matching [26].

McMillan-Bishop [33]

L. McMillan and G. Bishop propose Plenoptic Modeling as a consistent framework for the evaluation of image-based rendering systems. The authors give a concise problem definition and propose an image-based rendering system in light of the Plenoptic framework.

The paper introduces the use of the 5D plenoptic function, $ P_5(V_x,V_y,V_z,\theta,\phi)$, defined as the intensity of light rays passing through the camera center at every space locations $ (V_x,V_y,V_z)$ at every possible angle $ (\theta,\phi)$. The original 7D plenoptic function was presented by Adelson and Bergen [1]. The simplest plenoptic function is a 2D panorama, cylindrical or spherical, when the viewpoint is fixed.

Within the proposed plenoptic modeling the goal of image-based rendering is to generate a continuous representation of the Plenoptic function. The authors claim that all image-based rendering techniques can in fact be casted as attempts to reconstruct the Plenoptic function from a sample set of that function. They believe there are significant insights to be gained from this characterization, so they propose their system in light of this Plenoptic framework.

The samples used are cylindrical panoramas. The "angular disparity" of each pixel in stereo pairs of cylindrical panoramas is computed and used for generating new plenoptic function samples. The authors also introduces a geometric invariant for cylindrical projections that is equivalent to the epipolar constraint defined for planar projections. The original samples, cylindrical panoramic images, can so be used to reconstruct new virtual views from arbitrary locations. The reconstructed views are also capable of describing perspective effects and occlusions. In particular, the authors introduce a novel visible surface algorithm which guarantees back-to-front ordering.

Among the advantages: real-time display of visually rich environments (both indoor and outdoor) is possible without the need for special graphic hardware, the method allows for acquisition and exploitation of compact sample images, realistic visualization of complex sceneries where perspective effects and occlusion are correctly modeled, real-time display and the use of commonly available equipment.

Among the disadvantages: visualized scenes must be static and with fixed lighting conditions, reference images should be acquired close to each other, reconstructed views should be generated close to sample images.

Class of Approaches: no-geometry rendering [44], light-fields [19], mosaicking [26], geometrically-valid pixel reprojection [26].

Seitz-Dyer [40] [41]

S.M. Seitz and C.R. Dyer propose View Morphing, a way to generate new views of a scene from two basis views. This can be applied to both calibrated and uncalibrated images. At minimum, two basis views and their fundamental matrix are needed.

A scan-line algorithm for making image interpolation is presented that require only four user provided feature correspondences to produce valid orthographic views. The paper describes a simple image rectification procedure which guarantees that interpolation does in fact produce valid views, under generic assumptions about visibility and projection process.

The proposed technique uses basic principles of projective geometry, and introduces an extension to image morphing that correctly handles 3D projective camera and scene transformations. The authors propose to exploit monotonicity along epipolar lines to compose physically valid intermediate views without the need for full correspondence information. Under the assumption of monotonicity, it is shown that the problem is theoretically well-posed.

This result is significant in light of the fact that is not possible to fully recover the structure of the scene due to the aperture problem 1. Moreover, they demonstrate that for a particular range of views, the problem of view synthesis is in fact well-posed and does not require a full correspondence, that is, images interpolation is a physically valid mechanism for view interpolation. Views can consequently be generated by linear interpolation of the basis images, (if the basis images are first rectified).

Among the advantages: the method represents a practical and simple way of generating new views of a scene (under monotonicity assumptions), view synthesis does not suffer from the aperture problem, the technique may be applied to photographs as well as rendered scene, ability to synthesize changes both in viewpoint and image structure, interesting 3D effects via simple image transitions, applicable to both calibrated and uncalibrated images, suitable for application in entertainment industry and for limited bandwidth teleconferencing.

Among the disadvantages: the method requires multiple image re-sampling (loss of quality), local blurring when monotonicity assumption is violated, artifacts arising from errors in correspondence, it is only suitable for static scenes, the method needs four user provided feature correspondences, visualized regions need to be free of occluders.

Class of Approaches: implicit-geometry rendering [44], volumetric reconstruction [19], geometric-valid pixel reprojection [26].

Chen-Williams [10]

This paper presents View Interpolation an image interpolation approach to synthesize 3D scenes, where input images are a structured set of views of a 3D object or scene.

In order to reconstruct desired views several reference images are used along with image correspondence information. The view synthesis is based on linear interpolation of corresponding image points using range data to obtain correspondences, (as in view-morphing [40]).

Intermediate frames are used to approximate intermediate 3D transformations of the object or scene. The authors have investigated smooth interpolation between images by modeling the motion of pixels (i.e. optical flow) as one moves from one camera position to another. They have investigated special situations in which interpolation produces valid perspective views. They conclude that interpolated images do not in general correspond to exact perspective views. They point out and suggest solution for determining the visible surfaces. Like image morphing, View Interpolation uses photometric information as well as local derivative information in its reconstruction process.

Among the advantages: the proposed method can be performed at interactive rates, suitable for virtual holograms, walk-trough in virtual environments, incremental rendering, motion blur acceleration, and soft shadows cast (by area light sources) acceleration, the approach works well when generated views share a common gaze direction and the synthesized view-points are within 90 degrees of this gaze angle.

Among the disadvantages: problems in the generated images for points which are not mutually visible on both reference images (difficult to establish the flow field information), view approximation when the change in viewing position is not slight, static scene, problems may arise when the generated views do not share a common gaze direction, and when the synthesized view-points do not stay within 90 degrees of the gaze angle.

Class of Approaches: implicit-geometry rendering [44], volumetric reconstruction [19], interpolation from dense matching [26].

Shashua-Werman [43]

This paper based on the existence of certain trilinear functions of three views, (with a corresponding tensor of 27 intrinsic coefficients), [42], derives connections between the trilinear function invariants across three views and intrinsic structures and invariants of 3D space.

The result shows that the tensor of coefficients determined by three views replaces entirely the role of the fundamental matrix (and associated intrinsic structures of two views) in 3D tasks. In other words, the projective structure of the scene follows directly from the tensor without the need to recover any intrinsic structure associated with two views.

In addition the tensor encompass 2-view structures in the sense that the fundamental matrix is readily expressed as a solution of a linear system determined by the tensor, the rotational component of camera motion is expressible in closed form by the tensor, and a variety of means exist for recovering the epipoles from the tensor.

The major result is that exists a decomposition of the tensor into three matrices that corresponds to three intrinsic homography matrices of the three distinct planes. The planes are associated with the camera coordinate frame of the third view and provide a reference basis for reconstruction of invariants. This provides a geometric intrinsic structure of three views.

The author claims that the tensor offers a host of a new algorithms for recovering 3D information from 2D views, cuts through the epipolar geometry, makes room for statistics, and generally exploits the information available from measurement across views in a more efficient manner than any technique based on 2-view geometry.

Among the advantages: new algorithms for recovering 3D information from 2D views, an order of magnitude improvement compared to conventional techniques that rely on epipolar geometry (when synthesizing novel views from a pair of model views), applications in virtual reality, 3D television, recognition, fast rendering, 2-views structures (fundamental matrix, epipoles) are recoverable (linearly) from a tensor.

Among the disadvantages: static scene, some fiducial points are needed.

Class of Approaches: implicit-geometry rendering [44].

Avidan-Shashua [2]

This paper proposes a method where views are reconstructed directly without first estimate the depth, by exploiting certain invariants in the geometry of the problem.

Input consists of 3 images from which it is possible to compute a trilinear tensor who will provide a correct way to generate virtual views of the observed object. In particular, the trilinear tensor is computed from the point correspondences between reference images. In case of only two images, one of the images is replicated and regarded as third image. If the camera intrinsic parameters are known, then a new trilinear tensor can be computed from the known pose change with respect to the third camera location. The new view can subsequently be generated using the point correspondences from the first two images and the new trilinear tensor.

The authors claim that the trilinear tensor gives user wider perspective transformation possibilities than other methods in literature. Texture is achieved by an interpolation of reference images. A realistic effect is achievable with this technique, however, image rendering might not be real-time because of the dense matching and tensor computation.

Among the advantages: realistic effect, the use of tensor (recovering 3D information from 2D views, no epipolar geometry, etc), efficient synthesis of novel views and wide visualization range, applications in virtual reality, 3D television, recognition, fast rendering, 2-views structures (fundamental matrix, epipoles) are recoverable (linearly) from a tensor.

Among the disadvantages: this approach does not correctly reconstruct points that become occluded.

Class of Approaches: implicit-geometry rendering [44], points transfer [19].

Laveau-Faugeras [27]

The authors propose a system where views are reconstructed directly without first estimate the depth. Under the assumption that a complete pixel-wise correspondence is available, it is possible to predict a broad range of views. The use of epipolar geometries between images restricts the image flow field in such a way that it can be parameterized by a single disparity value and a fundamental matrix which represents the epipolar relationship. The authors also provide a two-dimensional ray-tracing-like solution to the visibility problem which does not require an underlying geometry description. Their method does, however, require establishing correspondence for each image point along the ray's path.

Class of Approaches: implicit-geometry rendering [44], points transfer [19].

Chang-Zakhor [8], [9]

This paper presents a method to generate arbitrary views of three dimensional scene by means of an intensity-depth representation.

By using an uncalibrated camera which scans a stationary scene under approximately known camera trajectories, and then by transforming points on camera image planes onto the plane of the virtual view, the proposed system derives dense depth-maps at several preselected viewpoints.

The authors propose an adaptive matching algorithm which assigns various confident levels at various regions. Once the depth maps are computed at preselected viewpoints, the intensity and the depth at these locations are estimated using a stereo algorithm and used to reconstruct arbitrary views of the 3D scene.

Among the advantages: fast and flexible image acquisition (hand held cam-corder, uncalibrated cameras, unknown camera position), well estimate depths, image-quality good for the most part, well-reconstructed horizontal edges, few errors concerning occluded regions.

Among the disadvantages: artifacts due to specularities of the surface, image matching performed poorly for background regions which are seen through holes of foreground regions, static scene (stationary 3D objects), only horizontal motion.

Class of Approaches: geometric-valid pixel reprojection [26].

Rousso-Peleg-Finci [39]

This paper concerns with stitching together images from adjacent viewpoints in order to generate a realistic panoramic virtual view of an observed environment. The authors propose an algorithm based on the method proposed in [37], to solve the main problem of panoramic mosaicing which is related to the forward camera motion (e.g. zooming). Pictures are segmented in vertical strips which are aligned by a "stretching" technique. In this way distortions appear greatly reduced.

CohenOr [12]

This paper presents a way to exploit projective texture-mapping to render adjacent views of reference images. The authors called these views Extrapolated Views. The aim was to improve time-performance of a walk-through in remote virtual environment.

Hirose [23]

This paper proposes the use of a camera with position sensors in order to make a interactive walk-through, based on pre-recorded sequence of images which are stored in a database. The use of image interpolation greatly reduces the required number of pre-recorded images while the known image-position allows to the system to recover reference images of interest from the database.

Regan-Pose [38]

This paper describes a hybrid system in which plenoptic samples are generated on the fly by a geometric-based rendering system at available rendering rates, while interactive rendering is provided by the image-based subsystem. At any instant, a user interacts with a single plenoptic sample. The authors also discuss local reconstruction approximations due to the changes in the viewing position. Local reconstruction approximations amount to treating the objects in the scene as being placed at infinity, resulting a loss of kinetic depth effects.


next up previous
Next: Model-Based Rendering Up: MainApprRVVS Previous: MainApprRVVS
Bob Fisher 2003-07-17