Rather than assuming that what an image should be compressed into is a set of edges, the phase congruency model of feature detection assumes that the compressed image format should be high in information (or entropy), and low in redundancy. Thus, instead of searching for points where there are sharp changes in intensity, this model searches for patterns of order in the phase component of the Fourier transform. Phase is chosen because the experiments of Oppenheim and Lim [9] demonstrated that it is crucial to the perception of visual features. Further physiological evidence [6] indicates that the human visual system responds strongly to points in an image where the phase information is highly ordered. Thus the phase congruency model defines features as points in an image with high phase order.
The phase congruency model is a frequency-based model of visual processing. It supposes that, instead of processing visual data spatially, the visual system is capable of performing calculations using the phase and amplitude of the individual frequency components in a signal. Thus, the underlying computational tool is the Fourier transform, or one of its equivalents. To this end, let us suppose that we can represent our image signal in the Fourier domain. To simplify the presentation, we will assume a simple one-dimensional signal, representing a 1D slice through an image. Such a signal, say f(x), is reconstructed from its Fourier transform by
For example, if the image signal were a simple step edge, then
![]() |
We need to make precise what it is we mean by phase congruency. This is done by defining the phase congruency function, PC(x), at each point x in the signal. We have
Although the definition of PC captures precisely what it is we want to measure, it is an awkward function to implement. Luckily, some simple trigonometric manipulations suffice to prove that PC is proportional to a well-known computation in biological vision, namely the local energy in a signal.
The local energy of a signal is defined in terms of the signal and its Hilbert transform. The Hilbert transform has a somewhat complicated definition in the spatial domain, namely
We can define a vector E at a point x by
The magnitude of the vector E is called the local energy of the signal (sometimes also called the envelope of the signal) and it is defined as
The argument
As was mentioned earlier, the human visual system has the capacity to simulate convolution by odd and even symmetric filters in quadrature. That the filters are in quadrature means not only that they form an odd and even symmetric pair (that is, the output of convolution by one filter is a 90o phase shift of the output of the other), but also that they both have a zero mean value and the same sum-of-squares value. More specifically, if Me represents the even filter and Mo represents the odd filter, then
The filters Me and Mo need to be carefully designed. The even symmetric filter is chosen so that it covers as much of the frequency spectrum as possible, whilst eliminating the d.c. term. The odd symmetric filter is then just a 90o phase shift filter of the even filter. Thus
It is now simple to see why , for
So, in order to search for local maxima in the phase congruency function, one equivalently searches for local maxima in the local energy function. These local maxima will occur at step edges of either parity (up or down), lines and bar edges, and other types of features such as the illusory Mach bands [8].
To illustrate how this works, figure 5 shows a simple test image that contains a variety of features at different contrasts. Figure 6 shows the output of a simple gradient-based edge detector (here, the Sobel operator). Note that the output depends on the relative contrast of the edge, and that the output for line features is two edges, one on either side of the line. Figure 7 shows the output of the local energy (or phase congruency) detector. Here we note that the output is a uniform response, regardless of the type or contrasts of the feature involved.
Local energy makes no assumptions about the shape of the features it is trying to detect, as it is only looking for points of local maximum phase congruency. Note that all other spatial edge detection methods have to make assumptions about the shape of the feature to be detected. Perona and Malik [10] have noted that most features in real images are composed of combinations of steps, roofs and ramps profiles, and that no linear feature detection scheme can detect such combinations. However, they prove that a quadratic scheme, such as the local energy scheme, is sufficient for the detection of all such features.Figures 8 and 9 illustrate the local energy map and the detected features for the mandrill image.