View-Adapted Texture Mapping

Once a correspondence between landmark reference-views and visual prediction has been established, the texture values in the reference-views can be mapped to the virtual view. It is proposed to apply projective texture mapping in an adaptive way, in order to take advantage of multiple reference-views of landmarks, (from different positions), depending on the current robot pose.

The basic concept is that the appearance of an object in the camera image-plane strongly depends on relative position between camera and object, and on present light conditions. In particular, different views of an object may reveal different visible aspects, (e.g. disocculsions), and local illumination effects, (e.g. shadows, reflections, highlights, etc.). Figure 4 summarizes the main factors affecting appearance of objects when observed from different viewpoints.

**Figure:** The figure summarizes main factors affecting appearance of objects when observed from different viewpoints.
$\begin{figure}\centerline{ \psfig{figure=figures/appearland_part.eps,height=6cm} }\end{figure}$

But, how should texture values of different reference-views contribute when transfered to the same pixel in the virtual view?

In some of the literature works related to realistic visualization the issue of blending multiple images have been addressed, and it has been demonstrated the advantage of considering more than one reference texture when generating texture on a virtual view.

Different techniques have been proposed for blending texture values relative to different views. Among them, simple weighting functions based on the angle of the camera to the object, to more sophisticated post-rendering calculations, (Mark et al [18]). In case a geometric model is available for the represented objects, (even if this is a coarse model), textures could efficiently be mapped by a view-dependent projective mapping as shown in Debevec, Yu and Borshukov [8]. In Debevec, Taylor and Malik [7] it is shown that such a mapping could also be exploited to refine the geometric model of an object by a technique named: model-based stereo.

In case of Livatino [14], it is proposed to merge two o more landmark reference views into a composite rendering, combining texture values of correspondent pixels in different reference views. In particular, the system calculates a weighted average where involved textures provide a different contribution. Images from reference views which are closer to current viewpoint are expected to better approximate the current view than a reference-view further away. Closer images are in fact expected to best approximate landmark visible aspects and local illumination effects present in current camera observation.

Figure 6 shows an example situation which can also be referred to the landmark of figure 5 (a portion of the computer monitor). In case of a planar object with a planar neighbor region, it is not that relevant which reference view should provide a higher contribution when estimating the texture of the current view. In case of an "almost" planar object with a not planar neighbor region (as the case of the monitor), the closest reference view (

in figure 5) should provide the higher contribution. In fact, the closest view contains reflections, visible aspects, etc., which could have not been shown in the farer reference view (

In particular, it is proposed to calculate a weighted average where involved textures provide a different contribution which depends on:

**Figure:** The figure left hand side shows the considered landmark (representing a part of the computer monitor). The figure right-hand side represents the angles between the lines that through the landmark center intersect the camera optical-center (angles $\alpha _1$ and $\alpha _2$ ), and the distances between current viewing position and the reference positions.
$\begin{figure}\centerline{ \psfig{figure=figures/vatm.eps,height=7.5cm} }\end{figure}$

**Figure:** Figure shows an example situation where the reference view , which is the closest to current view, is expected to provide a better texture information than the reference view which is farer. Possible reflections, highlights, shadows, and visible aspects, contained in the closest reference view, best approximate the current view texture. The figures top-row represent a portion of a planar object with a planar neighbor region, while the figures bottom-row represent a portion of an "almost" planar object, (e.g. the computer monitor), with a not planar neighbor region. "cc" represents an example of correlation coefficient which may result when matching the and views.
$\begin{figure}\centerline{ \psfig{figure=figures/VATM_ex.eps,height=8cm} }\end{figure}$

The reference view which is closer to current viewpoint in "angle" and in "distance" will thus give a higher contribution in the final summation. In case of two reference views, the resulting texture value for a pixel,

, would then be calculated as in the following:

The above equations can naturally be extended to the case of more than two reference views.

Merging reference-views based only on the above criteria can cause visible seams in the landmark visual prediction due to specularity, and unmodeled geometric detail may arise when neighboring textures comes from different reference-images and in case of occlusions or "disocclusions". Some of the techniques proposed in the literature for calculating texture transitions between different mapped views could then be applied to cope with the problem.

In the context of robot navigation as proposed in (Livatino [14]), the main reasons for proposing view-adapted texture mapping can be summarized in: