Pose estimation using the Hough Transform

Philipp Robbel

Introduction

The Hough Transform is a commonly used method to detect shapes in an image. The early published work was restricted to the detection of a number of analytic curves, such as lines or circles, in grey level images. Here, each sample point from the shape's boundary in the original image votes for all lines or circles that it may be part of in a multi-dimensional discretised array, commonly referred to as accumulator array or Hough space. The dimensionality of the Hough space is determined by the number of parameters of the analytic curve so that the

space implied by the line equation

has a dimensionality of two, for example. When multiple points vote for the same curve, local maxima are gradually built up in Hough space and may be extracted with a simple thresholding technique.

An extension of the original algorithm, which supports object boundaries of arbitrary non-analytic shape, is known as the Generalized Hough Transform [2]. In the following it is described how the Generalized Hough Transform may be used for 2D to 2D pose estimation to detect translation and rotation of a rigid model shape in a test image.

The Generalized Hough Transform

The Generalized Hough Transform does not describe objects as a parametric curve but instead traverses all the points on the shape's boundary track. The four-dimensional Hough space is given by the set

$\begin{displaymath} S = \{\mathbf{a},\theta,s\} \end{displaymath}$

where $\mathbf{a} = (x_{a},y_{a})$ denotes a reference origin for the shape, $\theta$ the shape's orientation, and

a scalar scaling factor. Before voting can take place in this space, we construct a table from the model shape that is commonly referred to as the R-table as follows.

**Figure 1:** Finding the displacement vector $\mathbf{r}$ for an arbitrary boundary point $\mathbf{x}$ (source: [2]).

First, one has to select an arbitrary fixed reference point $\mathbf{a}$ to the model shape, such as the object's centre of mass. For all points $\mathbf{x}$ on the boundary, one then stores the vector $\mathbf{r} = \mathbf{a} - \mathbf{x}$ in the R-table, commonly as a function of the gradient $\Phi$ . This process is visualised in Figure 1.

The model vectors stored in the table above can be used to detect the object in a test image, given that both orientation and scale are preserved in that image. Since this is generally not the case and since we would like to detect the shape's pose in the test image, we construct a number of additional tables for all plausible discretised values of $\theta$ and . For both, simple vector operations can be defined on all vectors stored in the original table. If we denote the original R-table by $R(\Phi)$ , these scaling and rotation operations are given as follows, resulting in a set of tables $T(s,\theta)$ [2].

$\begin{displaymath} T_{s}[R(\Phi)] = sR(\Phi) \\ \end{displaymath}$

$\begin{displaymath} T_{\theta}[R(\Phi)] = Rot\{R[(\Phi - \theta)mod2\pi],\theta\} \end{displaymath}$

Given a test image and the set of R-tables $T(s,\theta)$ with

entries each, all points $\mathbf{p}$ on the test shape's boundary track now vote in the four-dimensional Hough space

times as follows. For all vectors $\mathbf{r}$ in all tables $T(s,\theta)$ , we vote for the points $(\mathbf{p} + \mathbf{r},\theta,s)$ , usually by incrementing the respective array entry in Hough space. This procedure is outlined for an exemplary case with only three $\mathbf{r}$ vectors in a two-dimensional Hough space (no scaling or rotation of the shape) in Figure 2.

**Figure 2:** Reconstruction of the reference origin by adding all displacement vectors $\mathbf{r}$ to all boundary points $\mathbf{p}$ (source: [4]).

After all votes have been cast, we can identify a maximum in Hough space at the point $\overline{O}$ , the reconstructed reference origin of the shape. This procedure is identical for the four-dimensional Hough space when scaling or rotation is considered.

Pose estimation

2D to 2D pose estimation detects translation and rotation of the (possibly scaled) model shape in the test image. With the procedure outlined above, we have found the reconstructed reference origin $\overline{O}$ . Since the reference origin in the model shape $\mathbf{a}$ is also available, we can construct the translation vector $\mathbf{t} = \overline{O} - \mathbf{a}$ .

To obtain the rotational information, we have to find the correct orientation angle $\theta$ . This information is available from the set of precomputed R-tables $T(s,\theta)$ . We choose the angle $\theta$ associated with a particular table $T(s,\theta)$ if and only all vectors stored in that table led to the reconstruction of the origin $\overline{O}$ .

Extensions

One issue with the technique outlined above is that in general a large number of candidate scaling factors and rotation angles have to be considered if no prior knowledge is available to impose bounds on these values. In [3], a hardware-accelerated implementation of the Generalized Hough Transform is described that uses a consumer-class graphics card to run the exhaustive parameter search on a parallel architecture. An algorithmic approach to improve the speed of the Generalized Hough Transform is presented in [4]. Here, a hierarchical strategy is demonstrated that splits the image into a number of tiles that each have associated R-tables. The idea is that only the first tile has to be associated with an exhaustive set of tables for all rotations and scaling factors. Once we have obtained a local maximum in the first tile's accumulator array, we can pass on this information to the other levels. The inclusion of invariance in the Generalized Hough Transform is the topic of [1]. The authors describe a method that can extract shapes under similarity and affine transformations that arise with different 3D orientations of the target shape relative to the camera position.

Bibliography

1: A. S. Aguado, E. Montiel, and M. S. Nixon.
Invariant characterisation of the hough transform for pose estimation of arbitrary shapes.
Pattern Recognition, 35(5):1083-1097, 2002.
2: D. H. Ballard.
Generalizing the hough transform to detect arbitrary shapes.
Pattern Recognition, 13(2):111-122, 1981.
3: R. Strzodka, I. Ihrke, and M. Magnor.
A graphics hardware implementation of the Generalized Hough Transform for fast object recognition, scale, and 3D pose detection.
In Proceedings of IEEE International Conference on Image Analysis and Processing (ICIAP'03), pages 188-193, 2003.
4: M. Ulrich, C. Steger, and A. Baumgartner.
Real-time object recognition using a modified generalized Hough transform.
Pattern Recognition, 36(11):2557-2570, 2003.

Philipp Robbel 2005-02-14