Reconstruction based on photo consistency

Next: Modelling from sparse data Up: Geometry Modelling Previous: Shape from Silhouettes

Reconstruction based on photo consistency

Colour or greyscale variance provides information that is not exploited by shape from silhouettes methods that only use segmented binary images. This information can be considered as constraints in the 3D reconstruction as a valid point on the scene surface appears with the ``same'' colour over all images that are visible, under the assumption of constant illumination and Lambertian reflectance. This constraint is known as photo consistency and is the basis of a whole category of techniques on volumetric reconstruction.

Seitz et al. [51] first and Kutulakos [29] more recently have proposed a methodology for reconstructing scene models based on photo consistency. Initially a volume that encompasses the whole real world scene is defined and subdivided into voxels. The set of voxels is assumed to be either transparent and subsequently labelled as opaque if they pass the photo consistency test [51] or assumed solid and become transparent if they fail the test [29]. Testing photo consistency involves projecting every voxel centroid onto each of the images from which it is visible and thresholding the variance of the colours of the associated pixels in these images. This check however, requires accurate camera calibration and is very sensitive to noise.

Unfortunately, the photo consistency constraint does not guarantee a unique reconstruction as a whole set of models maybe consistent to the input images. To circumvent the ill posed nature of the problem, [29] has presented an algorithm called ``space carving'' which yields a unique reconstruction called the photo hull. This is the union of all photo consistent subsets of the scene. In this sense the photo hull is not a minimal reconstruction but one which ensures that no valid voxels are disregarded from the consistency test.

This test has to be performed for every voxel based on its projection on all the images from where it can be visible. This can be a very computationally expensinsive process as elimination of a voxel automatically results in changes of visibility for its surrounding voxels. To mitigate this practical limitation, [51] has proposed the ordinal visibility constraint where no scene points are contained inside the convex hull formed by the positions of the cameras. In such a configuration the visibility of every two points relative to a camera can be decided based on the distance of the points from the hull. Placing the cameras in a plane facing to the real scene permits a partitioning the 3D space into layers of uniform depth from this plane. Traversal of voxels in layers of increasing depth guarantees that before any voxel is visited all its possible occluders have been checked.

This approach has the advantage that each voxel is only visited once. However, the ordinal visibility constraint is very restrictive on the employment of the cameras as it is not possible to surround the scene with cameras. This problem has been addressed in [29] by a multi plane sweep technique along x,y,z coordinates and in both positive and negative directions. Evaluation of the consistency test is performed for each voxel in the specific plane using only the subset of cameras that lie infront of the plane.

Although this approach places no restrictions to the locations of the cameras relative to scene, checking the colour consistency of a voxel only uses a subset of images from which the voxel is visible. An alternative method has been proposed by [10] called generalised voxel colouring (GVC) where a layered depth image (LDI) data structure has been utilised so that every pixel is associated with a linked list of voxels that project over it sorted in depth order. Photo consistency is tested iteratively on every surface voxel (head of LDI), successively carving invalid voxels and replacing them with the subsequent voxels in the corresponding LDI entry until no change occurs in a complete pass over the surface. Although GVC achieves faster convergence over the multi plane sweep methodology it is applicable to cases where cameras are placed outside the voxel space.

Despite this limitation Slabaugh et al. [57] addressed the problem of large scale scene reconstruction using GVC. They subdivided the voxel space into an interior space where reconstruction of foreground surfaces is performed and an outer space which is used to model the background scene. This exterior space is warped so that the size of the voxels increases proportionally to their distance from the interior space with voxels on the shell of the exterior space having coordinates to infinity. To overcome the problem appeared from the cameras get embedded inside the voxel space since the exterior volume covers the whole scene, they manually precarve the volume. An algorithm similar to [10] was proposed for carving with the difference that voxels on the shell of the exterior space never become invalid irrespective of the photo consistency test as one cannot see beyond infinity. This approach yields improved results for outdoors scenes as both the foreground and the background are modelled. However it is not applicable to large scale indoor environments where visibility is restricted from wall structures and a large number of cameras have to be deployed making the precarving process infeasible.

Building a model for a large scale scene using volumetric representations is a challenging problem as the number of voxels required to bound the scene become prohibitive to process. A multi-resolution approach has been presented by Prock et al. [45] that attempts to tackle this problem using a coarse to fine strategy. They proposed covering the initial voxels space with low resolution voxels (large size) and testing photo consistency using the mean colour over the set of pixels to which a voxel is projected. The resulting model contains a large number of gaps and missing voxels as the lower resolution voxels may encompass only a small part of the real scene. By using a nearest neighbour search, they add back for each of the remaining colour voxels its four adjacent voxels, subdivide them into eight and test consistency for each of these new voxels. This approach appears to fill the voxels that were deleted as false negatives and considerably reduces the overall number of voxels used to reconstruct the scene.

The advantage of reducing the number of voxels is essential especially when objects further away from the camera have to be modelled. Kutulakos [28] presented another fine to coarse approach that enable such reconstruction by searching photo consistency in a circle of specific radius around the pixel that one voxel projects in each image. He called this test r-consistency and shows that the size of the radius can be considered as the controlling parameter to the detail of the reconstructed model. R-consistency yields a hierarchy of models where the photo hull is the finest and closest approximation to the real scene.

The relaxation of the photo consistency definition proposed in [28] results in robust reconstruction under calibration and image noise. Another approach that appears invariant to noise has been described in [15] where a multi hypothesis voxel coloring has been adopted. In an initial step they perform hypothesis generation for each voxel by testing photo consistency over every pair of views. If the voxel has the same colour in at least two cameras a new hypothesis is assigned to it. In the subsequent hypothesis removal step for each view the currently visible voxels are determined and their associated hypotheses are checked for consistency with the colour of the pixel they project onto. A voxel is carved from the voxel space if all its hypothesis are invalid. This approach has the advantage that carving is performed one image at a time and so voxels are visited less times. However, in the preprocessing stage even the intermediate voxels have to be coloured which adds a significant overhead to the methodology.

The constant illumination assumption adopted by all the voxel colouring methods can be relaxed in a multi hypothesis approach as only two images are enough to generate a valid voxel colour estimation. However, the method is still susceptible to calibration errors as only the centroid of each voxel is projected to each image and gets tested. Calibration is a difficult problem and becomes even more crucial when more than one cameras are used to capture the scene geometry. Saito et al. [50] presented a method that requires only the fundamental matrices that relate every image. In their scheme they select two basis views and define a projective voxel grid space. All voxels can then be reprojected onto any arbitrary view knowing the fundamental matrices that relates this view to the two basis images. This approach however has the disadvantage that the choice of the basis views affects the reconstruction quality as the shape and size of the voxels is determined by their location relative to the cameras that form this projective basis.

Next: Modelling from sparse data Up: Geometry Modelling Previous: Shape from Silhouettes

Bob Fisher
Wed Jan 23 15:38:40 GMT 2002