Specialized Hardware Structures for Morphological Image Processing

IV. Dedicated Hardware for Morphological Image Processing

Apart from the general-purpose image processing hardware, which is capable of performing some morphological image processing tasks, other hardware structures particularly suited for morphological image processing have been reported. A direct implementation of formulae (1) and (2) would be to perform the addition/subtraction through look-ahead carry adders/subtractors and the maximum/minimum selection by a tree of compare and swap elements (CS) (Pitas and Venetsanopoulos 1990). A CS consists of a comparator and a multiplexer. The logic for the look-ahead carry adder/subtractor, the comparator and the multiplexer can be found in any digital design textbook, e.g. (Mano 1991). A block diagram of the direct implementation is shown in Fig. 15. However, as it is shown bellow due to the existence of either faster or less consuming techniques, no devices have been reported according to the direct technique. The problem that has to be solved for the implementation of the dilation and erosion operations is located in coming up with a technique for fast tracking of the max and min values.

Figure 15. Direct implementation of a dilation/erosion computation module.

The techniques for the implementation of the specialized hardware structures for morphological processing can be classified into the following categories:

· Digital implementations[1]:

i. The threshold decomposition technique (Andreadis et al. 1996; Lin and Wong 1992; Pu and Shih 1995; Shih and Mitchell 1989).

ii. A bit serial technique based on modifications of the majority gate algorithm (Chen and Yang 1995; Diamantaras and Kung 1997; Gasteratos et al. 1996; Gasteratos et al. 1997a; Gasteratos et al. 1998b; Ko et al. 1995; Luck and Chakrabaty 1995).

iii. Other systolic array implementations (Abbott et al. 1988; Diamantaras et al. 1990; Gasteratos and Andreadis 2000; Kojima and Miyakawa 1996).

iv. Asynchronous implementation (Robin et al. 1996).

· Analogue implementations (Siskos et al. 1998; Urahama and Nagao 1995; Vlassis et al. 1998).

· Optical implementations (Botha and Casasent 1989; Casasent 1994).

Fitchet al. introduced the threshold decomposition technique as a methodology for the implementation of median and other rank order filters (Fitch et al. 1984; Fitch et al. 1985). This methodology has led to the development of the theory of the stack filters (Wendt et al. 1986). A stack filter is any filter, which can be defined using the threshold decomposition and a positive Boolean function (PBF). According to this approach a multilevel signal which obtain values into the set {0, 1, … M-1} is decomposed into M-1 binary signals. Each of these signals is filtered through a PBF. The same PBF is applied into every binary signal. The output of the stack filter is composed by adding all the binary results. A PBF is a Boolean function, which is implemented as a logical sum of logical products. In Fig. 16 the threshold decomposition technique is illustrated. The one-dimensional signal x is decomposed into three binary sequences x1, x2 and x3. Each of these sequences is filtered by a binary max filter with a 3x1 window size, i.e. a three input OR gate. Thus the binary outputs y1, y2 and y3 are obtained respectively according to the following formula: yi(t)=xi(t-1)+ xi(t)+xi(t+1); (i=1..3). These outputs are summed by columns and the final result y is obtained. This is the sequence x filtered by a max filter using a 3x1 window size. From the above discussed figure it is clear that the implementation of morphological operations with flat structuring elements is straightforward. The max or min filter is implemented via threshold decomposition and finally the offset of the structuring element is added. However, when the structuring element is not flat, threshold decomposition on both the image and the structuring element should be utilized. This technique has been applied in both classical morphology (Shih and Mitchell 1989) and soft morphology (Pu and Shih 1995).

Figure 16. Illustration of the threshold decomposition technique on a one-dimensional signal.

An ASIC implementing gray-scale dilation, according to this technique is reported in (Andreadis et al. 1996). Its circuitry is shown in Fig. 17. It is clearly identified that threshold decomposition on both gray-scale image and structuring element is used. It is also identified that the resolution of the image and structuring element pixel is restricted to 4 bits. This is the main drawback of this technique; that hardware complexity increases according to a factor O(2^2b+2^b), where b is the resolution of the pixels. Therefore, its application on high resolution numbers is impractical.

Figure 17.Block diagram of the circuitry of the threshold decomposition technique, for gray-scale dilation.

Lin and Wong (1992) have proposed a hybrid technique in order to overcome the problem of hardware complexity. According to this technique threshold decomposition is not applied on both the structuring element and the image, but the structuring element pixels are first added/subtracted to the corresponding image pixels and the threshold decomposition is applied to these results afterwards. In this case the overall system becomes slower, since the few gates structure on each stage is missed (Shih and Mitchell 1989). However, the hardware complexity is reduced to less than the half. A cumulative graphical comparison of the threshold decomposition, the hybrid and the direct implementations is provided in Fig. 18. It is worth noting that the axis that presents the number of standard cells, used in each case, is logarithmic. The threshold decomposition increases exponentially according both the pixel resolution and the structuring element size. The hybrid technique increases exponentially according to the pixel resolution, but linearly according to the structuring element size. The direct technique grows linearly both with the pixel resolution and the structuring element size. The standard cells needed for to implement a structure with a 3x3 structuring element and 4-bit pixel resolution are 2.5 times more according to threshold decomposition than according to the hybrid technique. But, for the same structuring element size and with 8-bit pixel resolution the number of standard cells is about 52.5 times bigger with threshold decomposition than with the hybrid technique.

Figure 18.Hardware requirements for gray-scale dilation according to image resolution and to the size of the structuring element, using the threshold decomposition technique.

A bit-serial algorithm for the calculation of max and min operations has been presented in (Chen 1989; Fitch 1987). Luck and Chakrabaty (1995) have implemented this algorithm in VLSI, in order to perform the operations of dilation and erosion. Diamantaras and Kung (1997) have proposed an 1-D systolic array architecture based on this algorithm, for the implementation of dilation and erosion, as well as opening and closing operations. Moreover, in (Diamantaras and Kung 1997) it is proposed a VLSI design for the basic component of this array, utilizing 32 processors in parallel for real time video processing. Chen and Yang (1995) have present a systolic array for the implementation of morphological operations based on that algorithm and Gasteratos et al. (1996) have improved the algorithm and have implemented it in VLSI (Gasteratos et al. 1997a). According to that algorithm the different bits of n numbers are processed by dedicated PEs in different stages of a systolic array. Suppose that there are n numbers whose the max (or the min) is required. In the first stage of the process the most significant bits (MSBs) of these numbers are processed. The numbers whose MSBs are "1" (or "0") are candidates to be the max (or the min). Thus, the MSB of the required max (or the min) number is "1" (or "0"). All the other numbers are rejected and are not taken into consideration in the next stages of the process. The flag signal f classifies the numbers into two categories. More specifically, if f ="1" then the number is a candidate in the next stage of the process, otherwise the number is excluded from the process. If all of the MSBs are "0" (or "1") then all of them are candidates to be the max (or the min) and the MSB of the result is "0" (or "1"). The procedure is repeated in the next stage for the lower order bits of the numbers that remain candidates. An illustration of this procedure for five four-bit numbers is given in Table 2; x_i (i=1..5) are the five numbers, b_j,iis the jth bit of the binary representation of x_i; f_j,i is the flag signal for the ith number at the jth stage. The processing element (PE) implementing one stage of the process is shown in Fig. 19.

Table 2. Illustrative example of the algorithm for the max tracing.

i	x_i	f_{0, i}	b_{1, i}	f_{1, i}	b_{2, i}	f_{2, i}	b_{3, i}	f_{3, i}	b_{4, i}	f_{4, i}
1	3	1	0	0	0	0	1	0	1	0
2	5	1	0	0	1	0	0	0	1	0
3	11	1	1	1	0	0	1	0	1	0
4	12	1	1	1	1	1	0	1	0	1
5	8	1	1	1	0	0	0	0	0	0
m =	12	m₁=	1	m₂=	1	m₃=	0	m₄=	0

Depending on its input MAX/ this PE can produce the jth bit of the max or the min of the n input numbers. A systolic array paradigm implementing this procedure, with structuring elements having nine pixels of eight-bit resolution, is shown in Fig. 20. The same bit-serial algorithm but slightly changed can be used for the implementation of opening and closing in real time (Ko et al. 1995). Another modification of the same algorithm has been reported for the realization of median filter (Chen 1989) and for any rank order filter (Gasteratos et al. 1997b). In soft morphological operation not only the max or min value may need, but other order statistics as well. Therefore, the architecture presented in (Gasteratos et al. 1997b) has also been used for the implementation of soft morphological operations (Gasteratos et al. 1998b).

Figure 19.The processing element implementing one stage of the bit-serial process of the max/min tracing.

Figure 20.A systolic array paradigm implementation using the PE of Fig. 19.

Table 3 presents a comparative study of the hardware requirements for the threshold decomposition, the direct and the majority gate techniques. In this case the realization of the standard basic morphological operations is considered. The image data window in both the techniques is a 9-pixel window, with 4-bit pixel resolution. From this table it is obvious that, even for this low-bit resolution case, the majority gate technique is more advantageous in terms of silicon area and the direct technique is comparable (the total amount of standard cells for each case is: threshold decomposition: 59634; direct: 662; majority gate: 505). Moreover, as it was stated above, the threshold decomposition technique is exponentially expanded according to the pixel value resolution, whilst the direct expands according to O(b[b-1]/2) and the majority gate technique expands linearly according to O(b). Therefore, the higher the resolution the more attractive the latter technique. On the other hand, due to its simple gate structure, the threshold decomposition technique is the fastest. For example let us consider the case of Table 3 implemented with an ES2 0.7um standard cell technology. The approximate frequency of execution for the threshold decomposition is 140 MHz and for the majority gate is 90 MHz. However in the majority gate technique if the bits b_i,jand m_i,j are not complemented in each PE, but in the initial and the final stage, respectively, then the PE, which causes the bottleneck, is simplified. In this case the circuit can attain an approximate frequency of execution up to 130MHz, which is comparable to that of the threshold decomposition. For the direct implementation, with the same technology, the approximate frequency of execution is also 130 MHz.

Table 3.An illustrative example for the hardware requirements of the threshold decomposition and majority gate techniques, for a 9-pixel window and 4-bit pixel resolution.

Threshold Decomposition	2 decoders of1 gray-level image window to 16 binary image windows	2029 binary morphological operators			30 max/min selection			1 adder of 30 1-bit inputs (12 half adders + 6 2-bit adders + 3 3-bit adders + 2 4-bit adder + 1 5-bit adder)
		Dilation	Erosion		Dilation	Erosion
	2 x 9 x (11 AND2 + 5 AND3 + 1 AND4 + 6 OR2 + 4 OR3 + 1 OR4)	2029 x (9 OR2 + 1 OR9)	2029 x (9 NOT + 9 AND2 + 1 AND9)		2 OR2 + 2 OR3 + 2 OR4 + 2 OR5 + 2 OR6 + 2 OR7 + 2 OR8 + 2 OR9 + 2 OR10 + 2 OR11 + 2 OR12 + 2 OR13 + 2 OR14 + 2 OR15 + 1 OR16	2 AND2 + 2 AND3 + 2 AND4 + 2 AND5 + 2 AND6 + 2 AND7 + 2 AND8 + 2 AND9 + 2 AND10 + 2 AND11 + 2 AND12 + 2 AND13 + 2 AND14 + 2 AND15 + 1 AND16		80 AND2 + 80 XOR2 + 22 AND3 + 10 AND4 + 4 AND5 + 1 AND6 + 12 OR2 + 12 OR3 + 6 OR4 + 3 OR5 + 1 OR6
Direct	9 4-bit adders/subtractors			8 CS elements
	9 x (12 XOR2 + 8 AND2 + 3 AND3 + 2 AND4 + 1 AND5 + 1 OR2 + 1 OR3 + 1 OR4 + 1 OR5)			8 4-bit comparators			8 4-bit multiplexers
				8 x (8 NOT + 10 AND2 + 4 NOR2 + 2 OR4 + 2 AND3 + 3 AND4)			8 x 4 x (4 AND2 + 1 OR4)
Majority Gate	9 4-bit adders/subtractors							5 PEs
	9 x (12 XOR2 + 8 AND2 + 3 AND3 + 2 AND4 + 1 AND5 + 1 OR2 + 1 OR3 + 1 OR4 + 1 OR5)							5 x (28 XOR2 + 18 AND2 + 1 OR9)

Pipeline systolic array paradigms for both binary and gray-scale morphology have been presented in (Abbott et al. 1988). The pipeline architecture is suitable for structuring elements obeying the chain rule, since in each stage of the pipeline machine the operation with a small structuring element can be performed. However, in the case of gray-scale image processing the authors does not propose a special network for the max/min implementation. A block diagram, for the computation of dilation and erosion operations, is shown in Fig. 21. An ASIC capable of computing rank order filters, weighted rank order filters, standard erosion and dilation, soft erosion and dilation, order statistic soft erosion and dilation, fuzzy erosion and dilation and fuzzy soft erosion and dilation, is presented in (Gasteratos and Andreadis 2000). Its architecture is based on local histogram and a successive approximation technique. This hardware structure is rather slow, however it is the only one reported in the literature performing all these operations on a single chip. Other systolic arrays for 2-D binary and 1-D gray-scale morphological image processing have been presented in (Diamantaras et al. 1990) and (Kojima and Miyakawa 1996) respectively.

Figure 21.Pipeline architecture for the performing of dilation and erosion

An asynchronous VLSI implementation of an array processor for gray-scale morphological image processing has been presented in (Robin et al. 1996). In this array the PE consist of an ALU, which implements the arithmetic and the max/min operations and multiplexers that controls the data from the neighbor PEs. Each PE communicates with its neighbors in asynchronous fashion.

Analogue implementations of morphological operations are not actually reported in the literature. However implementation of max/min (Siskos et al. 1998) and weighted rank order (Urahama and Nagao 1995; Vlassis et al. 1998) filters have been proposed. Max/min filtering of an image can be consider as dilating/eroding the image with a set, i.e. a flat structuring element. Similarly weighted rank order filters can be used for the implementation of soft morphological operations with a set. Analogue implementations are very attractive, since they offer high speed and silicon compactness. Moreover, since they do not demand A/D and D/A conversion, they can be attached directly to analogue devices, such as sensors and cameras, and they can built up smart instruments. It is obvious that a main disadvantage ofthese structures is that they are not able to operate with a gray-scale structuring element. Also they are not capable to perform straight on 2-D images, but they can apply only on rows and columns independently.

Optical implementations of binary morphological transforms have been presented in (Botha and Casasent 1989; Casasent 1994). This implementation is achieved through an optical matched spatial filter correlator architecture for symbolic substitution. The opening and closing transforms are implemented by adding a feedback path from the output to the input of the correlator.

References

Abbott L, Haralick R M and Zhuang X (1988) Pipeline Architectures for Morphologic Image Analysis, Machine Vision and Applications, 1 (1): 23-40.

Andreadis I, Gasteratos A and Tsalides P (1996) An ASIC for Fast Grey-Scale Dilation, Microprocessors and Microsystems, 20 (2): 89-95.

Bloch I and Maitre H (1995) Fuzzy Mathematical Morphologies: A Comparative Study, Pattern Recognition, 28 (9): 1341-1387.

Botha E C and Casasent D P (1989) Applications of Optical Morphological Transformations, Optical Engineering, 28 (5): 501-505.

Broggi A, Conte G, Gregoretti F, Sansoe C, Passerone R and Reyneri L M (1998) Design and Implementation of the PAPRICA Parallel Architecture, The Journal of VLSI Signal Processing, 19 (1): 5-18.

Broggi A, Conte G, Gregoretti F, Sansoe C and Reyneri L M (1997) The Evolution of the PAPRICA System, Computer-Aided Engineering Journal, 4 (2): 114-136.

Camps O I, Kanungo T and Haralick R M (1996) Grey-Scale Structuring Element Decomposition, IEEE Transactions on Image Processing, 5 (1): 111-120.

Casasent D (1994) General-Purpose Optical Pattern Recognition Image Processors, Proceedings of the IEEE, 82 (11): 1724-1734.

Chen C and Yang D L (1995) Realisation of Morphological Operations, IEE Proceedings: Circuits Devices and Systems, 142 (6): 364-368.

Chen K (1989) Bit-Serial Realizations of a Class of Nonlinear Filters Based on Positive Boolean Functions, IEEE Transactions on Circuits and Systems, 36: 785-794.

Diamantaras I and Kung S Y (1997) A Linear Systolic Array for Real-Time Morphological Image Processing, Journal of VLSI Signal Processing, 17 (1): 43-55.