A brief overview of main reference approaches in the field of realistic virtual view synthesis is presented. This review is intended for a reader interested in knowing about main approaches in image- and model-based rendering approaches, but not as an exhaustive survey of the research field.
The synthesis of realistic virtual views has received increasing attention in the last decade, mainly due the increased popularity of virtual reality and the spread of its applications. Hence, to the demand of increasing realism into generated sceneries while simplifying the modeling process.
A solution to the problem of providing visual realism to computer generated images is searched into the possibility of re-creating naturally occurring physical phenomena by real world observations. The "road" mainly investigated has been to capture the occurring phenomena through photographs, hence, directly or "indirectly" transfer them into novel generated views.
The direct way refers to warping algorithm which do not take into consideration any geometric information associated to the observed scene. These approaches usually require a dense set of reference-views. The "indirect" way instead refers to the case when reference image-views are supported by associated knowledge, (pixel correspondences, depth maps), or 3D models.
The achievement of such ambitious goal, a realistic synthesis, has mainly be attempted through two different rendering approaches: model-based and image-based.
The model-based approach represents the traditional way to generate virtual views of an object or scene. This approach is usually referred as Model-Based Rendering because it relies on a geometrical 3D model of the object or scene wished to be rendered. In this context the research is focused on improving model fidelity by using image-based modeling. In particular: geometric model extraction and representations for rendering purposes, object-texture extraction and mapping to geometric models, illumination effects recovery and rendering.
At a high level a model-based rendering approach involves three processes. First, an event or a scene must be recorded, then, a 3D model of the environment has to be extracted using computer vision techniques, and at the end, the obtained 3D model is rendered from the view of a virtual camera.
The image-based approach represents instead an alternative to model-based rendering, and a competing means of creating virtual views, primarily relying on real images taken as reference in place of a geometric 3D model. This approach is then referred as Image-Based Rendering. In order to produce novel views, reference images are usually interpolated or re-projected from source to target image.
This rendering approach is less generic than model-based rendering since utilized techniques often depend on the applications, thus, type of environment and required rendered field of view. The common characteristic is a rendering time independent from scene complexity and no need in principle for reconstruction of geometric models. Image-based rendering techniques often require an additional knowledge to input reference images, such as image-correspondences, depth information, epipolar relations, etc. This additional knowledge often is extracted from same input images or it is provided a priori.
Image-based rendering is usually applied to static environments whereas model-based rendering is often proposed for dynamic scene visualization. Authors have also proposed both the two approaches for the same application context, (Blanc, Livatino and Mohr , ).
A growing interest, which could also be considered as an "evolution" of image and model based rendering is towards hybrid methods, so that in the last years authors have presented methods which lie in-between image- and model-based rendering. A successful example is represented by the work of Debevec, Taylor and Malik , which proposes generation of novel views based on a reconstructed geometric model, where textures in novel views are mapped view-dependently.
Survey papers have also started with classifying works in realistic virtual view synthesis as a "continuum" of representations, (which might include model-reconstruction and model-based rendering), based on the tradeoff of many aspects. Among them: number of required input images, motion assumed for the virtual-camera, knowledge about the scene geometry, depths and correspondences, the way pixel are transferred etc. Previous work on the field is usually classified by the authors depending on which is the aspect they would like to focus on in their contribution.
In their classification of image-based rendering H. Shum et al., , propose three categories according to how much geometric information is used: no-geometry, implicit geometry (i.e. correspondences), and explicit geometry. D. Forsyth and J. Ponce, , also propose three categories but based on the type of approach: volumetric reconstruction, points transfer, and light-fields. L. McMillan, , proposes to distinguish approaches based on the way images have supplemented the image generation process: images to represent approximations of scene geometry, images in a database to represent different environment locations, and images as reference scene models from which to synthesize new views. S. Kang  proposes a categorization primarily based on the nature of the scheme for pixel indexing or transfer: non-physically based image mapping, mosaicking, interpolation from dense samples, and geometrically-valid pixel reprojection.