next up previous
Next: Image segmentation using belief Up: Model-based Approaches Previous: Computational vision

The Bayesian model

The active contour model described in Section gif is an example of a model-driven segmentation technique. The Kalman filter technique described in Section gif adds the notion of statistical variation to the model-driven segmentation process. In this section, I will introduce the Bayesian network (BN) approach to segmentation. Bayesian networks are also referred to as belief networks, probabilistic networks, probabilistic belief networks (PBN), and probabilistic causal networks. Since all of these terms can be used interchangeably, I will refer to them as belief networks throughout this dissertation. Regardless of the name, this approach adds the dual notions of probability- and utility-based decision making to the repertoire of segmentation techniques. The belief network model makes decisions about how to interpret probabilistic evidence (i.e., non-deterministic information) to support or reject a hypothesis or outcome. Outcomes that yield the highest expected utility are chosen as the optimal solutions. With the belief network model, the definition of expected utility incorporates all the probabilistic uncertainty associated with the outcome, as well as the inherent utility of the outcome. Utility can be defined in dimensions such as monetary cost, entropy, or energy. gif

Many researchers have been using belief networks as a convenient mechanism for managing uncertainty in expert systems. Most of these expert systems to date have dealt with the tasks of classification or diagnosis in a temporally static problem domain. To a lesser extent, researchers have also been able to model dynamic properties using belief networks in an effort to simulate and predict the behavior of time-varying systems.

The idea of incorporating belief networks into this model-driven computer vision system was first proposed by Levitt and Binford [8]. The motivation for using belief networks was to make image interpretation insensitive to variations in structure, viewpoint, sensor type, shading, illumination, and obscuration [8]. Furthermore, real world image interpretation requires the ability to handle millions of features in a statistically efficient manner, using an accurate and rigorous mathematical model of uncertainty. According to Binford, relating image features such as step, delta, and slope discontinuities (i.e., boundaries and edges) to image structures and object models is a difficult problem.

Levitt and Binford argue that the belief network fits naturally into the hierarchical image understanding model, where objects are composed of parts and joints. Joints specify the relationships among parts, which in turn are composed of subparts and joints. This recursive relationship can be expressed in a directed acyclic graph (see Section gif for a more detailed description of directed graphs). Bayesian inference is used to accrue evidence (i.e., observations) about the image in a mathematically coherent framework. In this manner, a sufficient set of probabilistic evidence, even if it is incomplete or ambiguous, can be amalgamated to support or deny hypotheses about the objects in the image. Competing hypotheses can be rank ordered by their overall probability, or likelihood of occurrence.

To briefly illustrate how a belief network can be used in model-based vision, suppose we have two three-dimensional objects that we would like to recognize from a photographic image: (a) a solid cube and (b) a solid pyramid. The two DAGs shown in Figure gif represent the belief networks for these two objects. Let tex2html_wrap_inline4074 represent the hypothesis ``object is a cube,'' and tex2html_wrap_inline4076 represent the hypothesis ``object is a pyramid.''

   figure679
Figure: A simple belief network representation of two objects as directed acyclic graphs (DAGs). The cube hypothesis (a) is represented as an object with three visible faces, nine visible edges, and seven visible vertices. The pyramid hypothesis (b) is represented as an object with at most three visible faces, six visible edges, and four visible vertices.

Given a photograph of a cube, we would expect to see at most three faces, nine edges and seven vertices. For a three-sided pyramid, we would expect to see at most three faces, six edges, and four vertices. The DAG for the cube shows that tex2html_wrap_inline4074 gives rise to three faces ( tex2html_wrap_inline4080 ). In turn, each face gives rise to four edges ( tex2html_wrap_inline4082 through tex2html_wrap_inline4084 ) and four vertices ( tex2html_wrap_inline4086 through tex2html_wrap_inline4088 ). By performing statistical experiments, we can determine the probability of seeing the faces under various lighting and viewing conditions. That is to say, we can determine P(F|H) the probability of detecting a face, given the type of object. Likewise, we can collect statistics on the probabilities of seeing the edges and vertices P(E|F) and P(V|F) given that we have seen a face. These statistics are called the observed probabilities because they are based on what we can observe from the image. What we would really like to know is the inferred probability, that is, the probability that the face was created a certain object. By using Baye's rule (also referred to as Kolmogorov's theorem by statisticians), we can compute the inferred probabilities. For example, the probability that a face was created by a cube can be computed as

  equation686

The inferred probability is now expressed in terms of the observed probabilities tex2html_wrap_inline4096 , tex2html_wrap_inline4098 , and tex2html_wrap_inline4100 (where tex2html_wrap_inline4102 means ``not'' tex2html_wrap_inline4074 , that is, the probability of seeing a face given that the object is a pyramid). tex2html_wrap_inline4074 is also called the prior probability of the cube hypothesis. In other words, it is the overall probability that the object is a cube in the absence of any other information. Statistically speaking, tex2html_wrap_inline4096 is simply the ratio of the cube population to the total population of objects.

In a similar manner, we can compute the probability that an edge was created by a face,

  equation694

Strictly speaking, the probability of observing a face given an edge will vary depending on the number of edges that were detected. That is,

equation703

However, to reduce the complexity of the computations and the size of the statistical data gathering task, the assumption of conditional independence can be introduced. This assumption states that the probability of seeing one edge does not increase the probability of seeing another edge. As a result, the subscripts on the edges can be dropped,

equation706

Although it might be tempting to do so, it is not possible to compute the probability of a hypothesis given an edge observation, tex2html_wrap_inline4110 , by simply multiplying P(H|F) by P(F|H). The computation of tex2html_wrap_inline4110 is a non-trivial problem and was first solved by Pearl (interested readers should consult [65] for a rigorous derivation). There are many commercial software packages that will compute inferred probabilities for belief networks using observed statistical data.

In addition to observations about the presence of edges and vertices, Binford and Levitt have proposed using relationships such as parallelism, connectivity, and angular displacements as evidence in the belief network. Furthermore, they have also proposed parameterizing the observed probabilities as a function of the viewing orientation. In this case, the observed probabilities would be represented by probability distributions rather than by scalar constants.gif In general, the computation of such probability distributions requires quasi-invariant transformations to map random variables from the measurement domain to the computation domain.

Several prototype systems have been built using the model-based belief networks. In section gif I will describe a system for segmenting two dimensional radiographic images. In section gif I will describe a complete vision system for interpreting monocular greyscale images based on this technique.


next up previous
Next: Image segmentation using belief Up: Model-based Approaches Previous: Computational vision

Ramani Pichumani
Mon Jul 7 10:34:23 PDT 1997