IV. Dedicated Hardware for Morphological Image
Processing
Apart from the
general-purpose image processing hardware, which is capable of performing some
morphological image processing tasks, other hardware structures particularly
suited for morphological image processing have been reported. A direct
implementation of formulae (1) and (2) would be to perform the
addition/subtraction through look-ahead carry adders/subtractors and the
maximum/minimum selection by a tree of compare and swap elements (CS) (Pitas and Venetsanopoulos
1990). A CS consists of a comparator and a multiplexer. The
logic for the look-ahead carry adder/subtractor, the comparator and the
multiplexer can be found in any digital design textbook, e.g. (Mano
1991). A block diagram of the direct implementation is
shown in Fig. 15. However, as it is shown bellow due to the existence of either
faster or less consuming techniques, no devices have been reported according to
the direct technique. The problem that has to be solved for the implementation
of the dilation and erosion operations is located in coming up with a technique
for fast tracking of the max and min values.
Figure 15. Direct implementation of a dilation/erosion computation module.
The techniques for the implementation of the specialized hardware structures for morphological processing can be classified into the following categories:
· Digital implementations[1]:
i. The threshold decomposition technique (Andreadis et al. 1996; Lin and Wong 1992; Pu and Shih 1995; Shih and Mitchell 1989).
ii. A bit serial technique based on modifications of the majority gate algorithm (Chen and Yang 1995; Diamantaras and Kung 1997; Gasteratos et al. 1996; Gasteratos et al. 1997a; Gasteratos et al. 1998b; Ko et al. 1995; Luck and Chakrabaty 1995).
iii. Other systolic array implementations (Abbott et al. 1988; Diamantaras et al. 1990; Gasteratos and Andreadis 2000; Kojima and Miyakawa 1996).
iv. Asynchronous implementation (Robin et al. 1996).
· Analogue implementations (Siskos et al. 1998; Urahama and Nagao 1995; Vlassis et al. 1998).
· Optical implementations (Botha and Casasent 1989; Casasent 1994).
Fitchet al. introduced the threshold
decomposition technique as a methodology for the implementation of median and
other rank order filters (Fitch
et al. 1984; Fitch et al. 1985). This methodology has led to the development of the
theory of the stack filters (Wendt
et al. 1986). A stack filter is any filter, which can be defined
using the threshold decomposition and a positive Boolean function (PBF).
According to this approach a multilevel signal which obtain values into the set
{0, 1, … M-1} is decomposed into M-1 binary signals. Each of these signals is
filtered through a PBF. The same PBF is applied into every binary signal. The
output of the stack filter is composed by adding all the binary results. A PBF
is a Boolean function, which is implemented as a logical sum of logical
products. In Fig. 16 the threshold decomposition technique is illustrated. The
one-dimensional signal x is decomposed into three binary sequences x1,
x2 and x3. Each of these sequences is filtered by a binary max
filter with a 3x1 window size, i.e. a three input OR gate. Thus the binary
outputs y1, y2 and y3 are obtained respectively according
to the following formula: yi(t)=xi(t-1)+ xi(t)+xi(t+1);
(i=1..3). These outputs are summed by columns and the final result y is
obtained. This is the sequence x filtered by a max filter using a 3x1
window size. From the above discussed figure it is clear that the
implementation of morphological operations with flat structuring elements is
straightforward. The max or min filter is implemented via threshold
decomposition and finally the offset of the structuring element is added.
However, when the structuring element is not flat, threshold decomposition on both
the image and the structuring element should be utilized. This technique has
been applied in both classical morphology (Shih and Mitchell 1989) and soft morphology (Pu and Shih 1995).
Figure 16. Illustration of the threshold decomposition technique on a one-dimensional signal.
An ASIC
implementing gray-scale dilation, according to this technique is reported
in (Andreadis et al.
1996). Its circuitry is shown in Fig. 17. It is clearly
identified that threshold decomposition on both gray-scale image and
structuring element is used. It is also identified that the resolution of the
image and structuring element pixel is restricted to 4 bits. This is the main
drawback of this technique; that hardware complexity increases according to a
factor O(22b+2b), where b is
the resolution of the pixels. Therefore, its application on high resolution
numbers is impractical.
Figure 17.Block diagram of the circuitry of the threshold
decomposition technique, for gray-scale dilation.
Lin and
Wong (1992) have proposed a hybrid technique in order to overcome
the problem of hardware complexity. According to this technique threshold
decomposition is not applied on both the structuring element and the image, but
the structuring element pixels are first added/subtracted to the corresponding
image pixels and the threshold decomposition is applied to these results
afterwards. In this case the overall system becomes slower, since the few gates
structure on each stage is missed (Shih
and Mitchell 1989). However, the hardware complexity is reduced to less
than the half. A cumulative graphical comparison of the threshold
decomposition, the hybrid and the direct implementations is provided in Fig.
18. It is worth noting that the axis that presents the number of standard
cells, used in each case, is logarithmic. The threshold decomposition increases
exponentially according both the pixel resolution and the structuring element
size. The hybrid technique increases exponentially according to the pixel
resolution, but linearly according to the structuring element size. The direct
technique grows linearly both with the pixel resolution and the structuring
element size. The standard cells needed for to implement a structure with a 3x3
structuring element and 4-bit pixel resolution are 2.5 times more according to
threshold decomposition than according to the hybrid technique. But, for the
same structuring element size and with 8-bit pixel resolution the number of
standard cells is about 52.5 times bigger with threshold decomposition than
with the hybrid technique.
Figure 18.Hardware requirements for gray-scale dilation according to image resolution and to the size of the structuring element, using the threshold decomposition technique.
A bit-serial algorithm for the calculation of max and min operations has been presented in (Chen 1989; Fitch 1987). Luck and Chakrabaty (1995) have implemented this algorithm in VLSI, in order to perform the operations of dilation and erosion. Diamantaras and Kung (1997) have proposed an 1-D systolic array architecture based on this algorithm, for the implementation of dilation and erosion, as well as opening and closing operations. Moreover, in (Diamantaras and Kung 1997) it is proposed a VLSI design for the basic component of this array, utilizing 32 processors in parallel for real time video processing. Chen and Yang (1995) have present a systolic array for the implementation of morphological operations based on that algorithm and Gasteratos et al. (1996) have improved the algorithm and have implemented it in VLSI (Gasteratos et al. 1997a). According to that algorithm the different bits of n numbers are processed by dedicated PEs in different stages of a systolic array. Suppose that there are n numbers whose the max (or the min) is required. In the first stage of the process the most significant bits (MSBs) of these numbers are processed. The numbers whose MSBs are "1" (or "0") are candidates to be the max (or the min). Thus, the MSB of the required max (or the min) number is "1" (or "0"). All the other numbers are rejected and are not taken into consideration in the next stages of the process. The flag signal f classifies the numbers into two categories. More specifically, if f ="1" then the number is a candidate in the next stage of the process, otherwise the number is excluded from the process. If all of the MSBs are "0" (or "1") then all of them are candidates to be the max (or the min) and the MSB of the result is "0" (or "1"). The procedure is repeated in the next stage for the lower order bits of the numbers that remain candidates. An illustration of this procedure for five four-bit numbers is given in Table 2; xi (i=1..5) are the five numbers, bj,iis the jth bit of the binary representation of xi; fj,i is the flag signal for the ith number at the jth stage. The processing element (PE) implementing one stage of the process is shown in Fig. 19.
Table 2. Illustrative example of the algorithm for the max tracing.
i |
xi |
f0, i |
b1, i |
f1, i |
b2, i |
f2, i |
b3, i |
f3, i |
b4, i |
f4, i |
1 |
3 |
1 |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
2 |
5 |
1 |
0 |
0 |
1 |
0 |
0 |
0 |
1 |
0 |
3 |
11 |
1 |
1 |
1 |
0 |
0 |
1 |
0 |
1 |
0 |
4 |
12 |
1 |
1 |
1 |
1 |
1 |
0 |
1 |
0 |
1 |
5 |
8 |
1 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
m = |
12 |
m1 = |
1 |
m2 = |
1 |
m3 = |
0 |
m4 = |
0 |
|
this PE can produce the jth bit of the max or
the min of the n input numbers. A systolic array paradigm implementing
this procedure, with structuring elements having nine pixels of eight-bit
resolution, is shown in Fig. 20. The same bit-serial algorithm but slightly
changed can be used for the implementation of opening and closing in real
time (Ko et al.
1995). Another modification of the same algorithm has been
reported for the realization of median filter (Chen 1989) and for any rank order filter (Gasteratos et al.
1997b). In soft morphological operation not only the max or
min value may need, but other order statistics as well. Therefore, the
architecture presented in (Gasteratos
et al. 1997b) has also been used for the implementation of soft
morphological operations (Gasteratos
et al. 1998b).
Figure 19.The processing element implementing one stage
of the bit-serial process of the max/min tracing.
Figure 20.A systolic array paradigm implementation using the PE of Fig. 19.
Table 3 presents a comparative study of the hardware requirements for the threshold decomposition, the direct and the majority gate techniques. In this case the realization of the standard basic morphological operations is considered. The image data window in both the techniques is a 9-pixel window, with 4-bit pixel resolution. From this table it is obvious that, even for this low-bit resolution case, the majority gate technique is more advantageous in terms of silicon area and the direct technique is comparable (the total amount of standard cells for each case is: threshold decomposition: 59634; direct: 662; majority gate: 505). Moreover, as it was stated above, the threshold decomposition technique is exponentially expanded according to the pixel value resolution, whilst the direct expands according to O(b[b-1]/2) and the majority gate technique expands linearly according to O(b). Therefore, the higher the resolution the more attractive the latter technique. On the other hand, due to its simple gate structure, the threshold decomposition technique is the fastest. For example let us consider the case of Table 3 implemented with an ES2 0.7um standard cell technology. The approximate frequency of execution for the threshold decomposition is 140 MHz and for the majority gate is 90 MHz. However in the majority gate technique if the bits bi,j and mi,j are not complemented in each PE, but in the initial and the final stage, respectively, then the PE, which causes the bottleneck, is simplified. In this case the circuit can attain an approximate frequency of execution up to 130MHz, which is comparable to that of the threshold decomposition. For the direct implementation, with the same technology, the approximate frequency of execution is also 130 MHz.
Table 3.An illustrative example for the hardware requirements of the threshold decomposition and majority gate techniques, for a 9-pixel window and 4-bit pixel resolution.
Threshold
Decomposition
|
2 decoders of1
gray-level image window to 16 binary image windows |
2029 binary
morphological operators |
30 max/min selection |
1 adder of 30 1-bit
inputs (12 half adders + 6 2-bit adders + 3 3-bit adders + 2 4-bit adder + 1
5-bit adder) |
||||
Dilation |
Erosion |
Dilation |
Erosion |
|||||
2 x 9 x (11 AND2 + 5
AND3 + 1 AND4 + 6 OR2 + 4 OR3 + 1 OR4) |
2029 x (9 OR2 + 1 OR9) |
2029 x (9 NOT + 9 AND2
+ 1 AND9) |
2 OR2 + 2 OR3 + 2 OR4 +
2 OR5 + 2 OR6 + 2 OR7 + 2 OR8 + 2 OR9 + 2 OR10 + 2 OR11 + 2 OR12 + 2 OR13 + 2
OR14 + 2 OR15 + 1 OR16 |
2 AND2 + 2 AND3 + 2
AND4 + 2 AND5 + 2 AND6 + 2 AND7 + 2 AND8 + 2 AND9 + 2 AND10 + 2 AND11 + 2 AND12
+ 2 AND13 + 2 AND14 + 2 AND15 + 1 AND16 |
80 AND2 + 80 XOR2 + 22
AND3 + 10 AND4 + 4 AND5 + 1 AND6 + 12 OR2 + 12 OR3 + 6 OR4 + 3 OR5 + 1 OR6 |
|||
Direct |
9 4-bit
adders/subtractors |
8 CS elements |
||||||
9 x (12 XOR2 + 8 AND2 +
3 AND3 + 2 AND4 + 1 AND5 + 1 OR2 + 1 OR3 + 1 OR4 + 1 OR5) |
8 4-bit comparators |
8 4-bit multiplexers |
||||||
8 x (8 NOT + 10 AND2 +
4 NOR2 + 2 OR4 + 2 AND3 + 3 AND4) |
8 x 4 x (4 AND2 + 1 OR4) |
|||||||
Majority Gate |
9 4-bit
adders/subtractors |
5 PEs |
||||||
9 x (12 XOR2 + 8 AND2 +
3 AND3 + 2 AND4 + 1 AND5 + 1 OR2 + 1 OR3 + 1 OR4 + 1 OR5) |
5 x (28 XOR2 + 18 AND2
+ 1 OR9) |
|||||||
|
|
|
|
|
|
|
|
|
Figure 21.Pipeline architecture for the performing of dilation and erosion
An asynchronous VLSI implementation of an array processor for gray-scale morphological image processing has been presented in (Robin et al. 1996). In this array the PE consist of an ALU, which implements the arithmetic and the max/min operations and multiplexers that controls the data from the neighbor PEs. Each PE communicates with its neighbors in asynchronous fashion.
Analogue implementations of morphological operations are not actually reported in the literature. However implementation of max/min (Siskos et al. 1998) and weighted rank order (Urahama and Nagao 1995; Vlassis et al. 1998) filters have been proposed. Max/min filtering of an image can be consider as dilating/eroding the image with a set, i.e. a flat structuring element. Similarly weighted rank order filters can be used for the implementation of soft morphological operations with a set. Analogue implementations are very attractive, since they offer high speed and silicon compactness. Moreover, since they do not demand A/D and D/A conversion, they can be attached directly to analogue devices, such as sensors and cameras, and they can built up smart instruments. It is obvious that a main disadvantage ofthese structures is that they are not able to operate with a gray-scale structuring element. Also they are not capable to perform straight on 2-D images, but they can apply only on rows and columns independently.
Optical implementations of binary morphological transforms have been presented in (Botha and Casasent 1989; Casasent 1994). This implementation is achieved through an optical matched spatial filter correlator architecture for symbolic substitution. The opening and closing transforms are implemented by adding a feedback path from the output to the input of the correlator.
Duncan R (1990) A Survey of Parallel Computer Architectures, IEEE Computer, 23: 5-16.
Flynn M J (1966) Very High Speed Computing Systems, Proceedings of the IEEE, 56 (12): 1901-1909.
Haralick R M and Shapiro L G (1992): Computer and Robot Vision, Addison-Wesley, New York.
Klein J C and Serra J (1977) The Texture Analyzer, Journal of Microscopy, 95: 349-356.
Mano M M (1991): Digital Design, Prentice-Hall, Englewood Cliffs, NJ.
Petkov N (1993): Systolic Parallel Processing, North-Holland, Amsterdam, The Netherlands.
Rosenfeld A (1983) Parallel Image Processing Using Cellular Arrays, IEEE Computer, 16: 14-20.
Serra J (1982): Image Analysis and Mathematical Morphology, Academic Press, London.
Zhuang X and Haralick R M (1986) Morphological Structuring Element Decomposition,
Computer Vision, Graphics and Image Processing, 35 (3): 370-382.