Reconstruction using active sensors

Next: Reconstruction from dense passive Up: Geometry Modelling Previous: Geometry Modelling

Reconstruction using active sensors

Active sensing is more recent than passive however the associated technology has significantly progressed over the last decade making the use of such sensors more widely spread. There are two main categories of active sensors. The ones based on structured light triangulation and the laser range scanners (LRS). Sensors belonging in the first class project a light stripe on the scene and use a camera to view it. Based on accurate knowledge on the configuration of the light emitter relative to the camera depth can then be computed. The second category of sensors emit and receive a laser beam and by measuring difference of phase, time of flight or frequency shift depth is measured. These latter sensors are the most widely used in robotics applications due mainly to their ability to operate on medium and long ranges.

The idea of using a laser range scanner for acquisition of 3D data is not new. Nitzan et al. [43] used a laser which was based on the phase difference between the emitted and the received beam in order to estimate distance to a point. At the time, the acquisition rate was 500ms per pixel making the formation of a 128x128 image a process lasting more than two hours. Now, it is possible to capture higher resolution (500x500) range images at a rate of several Hz. Despite the technological advances since then though, range sensors are not as wide spread as one might expected. The main reason is that the requirements for better speed and accuracy led to systems that were too bulky and costly for use beyond the research and industrial markets.

LRS have been used to obtain information required in the reconstruction of 3D representations for real objects and scenes. The reason is that such sensors provide an accurate range map of the scene in their field of view (FOV). In general however no complete model can be build from a single viewpoint due to the occlusions caused by the complexity of the scene surfaces and the limited sensor's FOV relative to the size of the scene to be reconstructed.

By deploying the LRS in different positions potentially the whole scene can be scanned. However, this acquisition approach yields to the important problem of registering the partially overlapping images in a common coordinate frame. The goal of registration is to find the transformation that best aligns the range images. There are two main methodologies addressing image registration. The first is to practically circumvent the problem completely by relying on precisely calibrated mechanical equipment such as turntables and eye-in-hand devices (robot arms) to determine the motion between the views. The second consists of methods that exploit information contained in the overlapping range images.

The vast majority of registration techniques are based on the Iterative Closest Point (ICP) method. The main idea is to refine a rigid transformation by iteratively computing the incremental transformations that minimise the distance between the transformed points from the first view to the second. The first developed and most well known algorithm based on this strategy has been proposed by Besl et al. [5]. They assumed correspondence between points based on the closest distance metric and compute the rotation and translation vectors that minimises the residuals at each step in a least squares sense. The original ICP algorithm has several limitations the most prominent of which are that a good initial estimate of the transformation is required in order to ensure convergence, the method is not robust to noise and one data set is assumed to be a subset of the other. Further research has successfully addressed all these limitations.

Once the multiple images are registered into a common coordinate frame range data from regions of the scanned surface captured from more than one image can be integrated to obtain a single fused surface. Strategies for integrating multiple range images can be classified to mesh and volume based according to the intermediate representation used. Mesh integration techniques [58, 63, 49] initially perform a triangulation to the raw 3D data, identify the overlapping regions, remove the redundant geometry for both meshes and in a final step stitch them to a single representation. Although these methods appear to perform better in computational terms [24] they can still fail catastrophically in areas of high curvature [11].

Volumetric based integration methods [25, 11, 47] are generally more robust. The whole scene volume is subdivided into a discrete set of voxels and for each image a continuous function that represents the weight signed distance of a point to the nearest estimated surface is sampled at the vertices of a voxel. Voxels that cover regions of the real surface which have been scanned several times will have multiple estimates of the function on each vertex which are combined according to their weights. The isosurface that corresponds to the zero set of the function is considered as the reconstructed surface. The calculation of the zero points of the function and its polygonisation can be achieved by the marching cube algorithm[35].

The methodology involving acquisition, registration and integration of multiple scans has been extensively applied to the reconstruction of small objects. and applications have been successfully developed to cover the market demand. Inherently though this reconstruction approach cannot be directly extended to build 3D models for large scale scenes because of their computational complexity and storage costs. Image registration using ICP is a iterative minimisation process over hundreds or even thousands of points while image integration can be shown [24] to exhibit a computational and storage complexity where M is the number of images and N the average number of points in each range image.

An attempt to develop a system for scene reconstruction based on this methodology has been presented by El-Hakim et al. [17]. Their system consists of eight CCD cameras and a LRS mounted on a mobile platform. For each position of the platform the LRS produces a range map for each of the 8 images. Features corresponding to discontinuities in both images and range maps are automatically extracted but manually matched. Bundle adjustment is used to compute the positional parameters of each image in a global coordinate system. Correspondences between 2D and 3D features are then used to register the range data which are subsequently integrated by a volumetric method [47]. Although no direct indication for the execution time of the reconstruction process has been presented the resulting model for a single empty room consisted of several thousand triangles for the coarser voxel resolution.

The size of the reconstructed models is directly related to the amount of data yielded by each scan. If this raw data is triangulated into facets then an extremely large and complex mesh representation is recovered for the scene. To remedy this inherent problem Sequeira et al. [53] tried to automatically fit planes to the 3D measurements. A nearest neighbour multiresolution triangulation guided by the surface information is subsequently used to yield a triangular mesh. The registration is performed by an ICP algorithm while a mesh based integration process that disregards the less reliable points on overlapping regions yields the final model. The method is recursive and their mobile platform is driven by a view point planning algorithm that detects occlusions based on surface discontinuities and selects the next capture point that optimises a number of criteria such as occlusions resolution and acquisition conditions. A similar but batch approach has been presented by Stamos et al. [59].

More recently El-Hakim et al. [16] have extended the plane fitting method to more general surface patches and CAD models. However, their method is highly interactive as manual intervention is necessary in both the registration stage and the feature grouping where the human operator is required to draw with the mouse a window bounding the points that belong to the same surface. Each of the segmented patches is subsequently triangulated and textured mapped separately which results in intersecting surfaces not connected in the model.

Plane fitting is a good method to reduce the amount of data acquired by the laser scanner. Nevertheless distinguishing scene features smaller than the threshold adopted for considering data as part of a surface becomes unfeasible. Data corresponding to these features collapse to their assigned surface and so disappear from the reconstructed model. Similar effects characterise an alternative data reduction approach which is based on mesh simplification as triangles collapses based on measures such as curvature or locality. A reconstruction method that uses such a reduction approach has been proposed by Hubert et al. [26]. For each range scan they compute a simplified mesh and measure the local surface shape at specific points using spin images which is a 2D signature. Measuring similarity between these spin images they estimate an initial alignment which subsequently improved with an ICP algorithm.

Figure 1: Model reconstruction by recursively carving a solid volume

Data reduction is essential for reconstruction of small, flexible and versatile models however it cannot be applied to modelling methods [6, 48] aiming at recursively ``carving'' a solid volume around the scene. The main idea behind these methods is to subdivide the space into voxels that are initially considered as full, deploy the LRS somewhere in this solid environment and emit a series of beams within its FOV. Any space between the scanner and each point on the surface of the scene pointed by a laser beam is considered empty (Figure 1). By repositioning the LRS and acquiring new range maps a model of the occupied space is iteratively built. If this process is repeated by selecting new views for regions of space not previously scanned the approach can converge to a full model of the environment.

Volumetric modelling techniques like [6, 48, 17] although applied in reconstruction of large scenes always result in huge impractical polygonised models. Although the size of such models can be significantly reduced by decreasing the voxel resolution this appears limited as the size of the voxels sets the bound on the expected error so that if it exceeds a certain threshold the advantages for using a LRS are disregarded.

Next: Reconstruction from dense passive Up: Geometry Modelling Previous: Geometry Modelling

Bob Fisher
Wed Jan 23 15:38:40 GMT 2002