III. General Purpose Image Processing Architectures and Machines

The computer architectures has been classified by Flynn (1966) into four major categories: 

       SISD (Single Instruction stream Single Data stream): This is the serial computer architecture.

       MISD (Multiple Instruction stream Single Data stream): This involves several processors, which execute different instructions on the same datum. This category is hypothetical and it is concerned as impractical.

       SIMD (Single Instruction stream Multiple Data stream): Several processors execute the same instruction on different data.

       MIMD (Multiple Instruction stream Multiple Data stream): Several processors execute different instructions on different data.

Figure 4. Decomposition of a 9 9-pixel structuring element into 9 3 3-pixel structuring elements.

In image processing, due to the vast amount of data, parallelism is utilized. The parallel architectures can be grouped into two major categories, synchronous and MIMD (Duncan 1990). However this grouping is not unique and overlapping between the categories may be found in the literature (see e.g. (Jonker 1992; Petkov 1993)). According to (Duncan 1990) the categorization is as follows: 


       Vector processors


       Systolic architectures


       Distributed memory

       Shared memory

Vector processors utilize multiple processing elements coupled on a multiple bus. These elements are capable of operating both logical and arithmetic operations on either vector or scalar numbers. The Delft Image Processor (Gerritsen and Aardema 1981; Jonker 1992) (DIP), shown in Fig. 5, can be characterized as a vector processor. This processor is capable of generating image and neighborhood points, it can handle floating point numbers and it is able to perform recursive operations.

Figure 5. The Delft Image Processor (DIP).

Figure 6. SIMD architecture.

In SIMD architecture a control unit is used, that provides a single instruction to all the processing elements concurrently, as shown in Fig. 6. The processing elements (PEs) execute the instruction on data, which is stored in local memory. SIMD is the suitable architecture for fast low-level image processing (Gerritsen and Aardema 1981; Jonker 1992; Reeves 1984), where mathematical morphology belongs. Examples of SIMD machines are the CLIP4 (Duff 1982; Preston et al. 1979) the CLIP7A (Fountain et al. 1988) the AIS-5000 parallel processor (Schmitt and Wilson 1988) and the PAPRICA system (Broggi et al. 1998; Broggi et al. 1997). The processing element of CLIP4 and its network topology are shown in Figs. 7a and 7b, respectively. As it is shown in this figure each processing element communicates with its eight nearest neighbors. For cellular logic operations the neighborhood can be accessed in parallel, as it happens in all the architectures based on cellular automata. However it is possible to perform recursive neighborhood operations, as well, via feedback obtained by the carry flip-flop. The CLIP7 chip is a successor of CLIP4. This is supplied with more memory per processor, improved image I/O speeds, increased local control and multi-bit structure. 



Figure 7.The CLIP4 image processor: (a) block diagram of the processing element and (b) network topology.

The AIS-5000 system consists of an SIMD array parallel processor, a general-purpose microprocessor and high-speed I/O hardware for the parallel processor. The parallel array can contain up to 1024 programmable PEs with local memory and connections with two neighbors PEs, as shown in Fig. 8a. The PEs are programmable and can perform several operations, such as Boolean, arithmetic and neighborhood operations. The implementation of such a neighborhood operation, named erode_nbr, is shown in Fig. 8b. The PE is programmed via the truth table, which is given in hexadecimal number. An example of a truth table, the corresponding hexadecimal number and the structuring element are provided in Fig. 8c. 




Figure 8. (a) The AIS-5000 parallel array topology and (b) implementation of binary erosion on the AIS-5000 system (c) a structuring element and the corresponding truth table for binary erosion.

The PAPRICA system has been designed and realized as a massively parallel morphological coprocessor. It based on a hierarchical morphological computational model. PAPRICA was designed to be attached on general-purpose machines. It comprises five major functional parts (Fig. 9a). These are the frame grabber (grabs 512 512 images with 8 bit/pixel resolution at video rate), the program memory, the image memory, the processor array and the control unit. The main part ofPAPRICA system is its processor array in which fits exactly the input image. This array consists of two different type internal registers: the morphological registers (MOR) and the logical register (LOR), as shown in Fig. 9b (PAPRICA-3 version with 256 PEs). The MOR consist of 5-bit shift-registers, which allow to perform morphological operations on an enhanced 3 3, cross-like, neighborhood as the one shown in Fig. 9c. The LOR are used to perform temporary storage, as well as for temporary storage. 




Figure 9. (a) Block diagram of the PAPRICA system, (b) the PE topology of PAPRICA-3 and (c) the enhanced 3 3neighborhood.

Systolic architectures or systolic arrays are the most widely used architectures for low-level image processing. A systolic array is designed to embody a certain algorithm in hardware. The PEs of a systolic architecture are rigid, i.e. they are not programmable. The data is pipelined in rhythmic fashion from the memory to the network of the PEs and it returns back to the memory. The network topology of a systolic array, denoting the data flow path is shown in Fig. 10. The synchronization of data is achieved through a global clock, which controls all the PEs. Implementations of systolic arrays for morphological image processing are analytically described in the next section.

Figure 10.Systolic array network topology and data flow path.

The MIMD architectures are discriminated into distributed-memory architectures (ring, mesh, pyramid, hypercube) and shared-memory architectures (Hwang and Briggs 1987). The distributed memory architectures consist of several processors interconnected in various ways. Each processor communicates with its local memory as well as with the adjacent processors. Fig. 11a depicts the ring topology architecture. This topology is appropriate for a limited number of nodes and it is suitable for algorithms, which do not demand vast data exchange. A more sophisticated topology is the mesh topology (Rosenfeld 1983), shown in Fig. 11b. This can be applied into algorithms involved with matrixes, such as 2-D image processing. However, many low and intermediate level image processing algorithms require comparison of pixels which are distant located; in this topology the communication between two nodes not belonging in the same neighbor is slow. An extension of the mesh topology is the pyramid topology. This consists of overlapping meshes with successively lower resolution, as shown in Fig. 11c. 


(a)                                (b)                                                        (c)

Figure 11.Distributed-memory network topologies: (a) ring, (b) mesh and (c) pyramid.

The pyramid topology is suitable for the implementation of low and intermediate level image processing; especially those that are dominated by data transaction between spatial distance pixels. Another more complicated topology is the hypercube topology (Frieder 1990). According to this architecture the nodes are located on the corners of an n dimension cube. A machine of dimension d, built on this architecture, has 2nodes, each being connected to its d neighbor nodes. Figs. 12a, 12b, 12c and 12d depict the topology of a one-dimensional (1-D), a two-dimensional (2-D), a three-dimensional (3-D) and a four-dimensional (4-D) hypercube respectively. A paradigm of a hypercube structure `(enhanced hypercube) for machine vision applications is Proteus (Haralick et al. 1995). The topology of Proteus architecture is shown in Fig. 13. It is a hierarchical re-configurable interconnection network with its highest level being a circuit switched enhanced hypercube. The Proteus system is capable of performing morphological image processing tasks such as dilation, erosion, closing and opening. It accepts the data via multiple links at a rate up to 640MBps.

Figure 12. Hypercube network topologies: (a) 1-D, (b) 2-D, (c) 3-D and (d) 4-D.

Figure 13. The Proteus enhanced hypercube topology.

In shared memory architectures the PEs are controlled by a common memory. Generally speaking there are two ways of interfacing the PEs with memory. The simple one consists of a fast bus on which the PEs and the memory are tied Fig. 14a. Such machines typically consist of up to 30 PEs. The more sophisticated way comprises a multi-layer interface network, which is responsible to interconnect the PEs with the pieces of memory Fig. 14b. The main advantage of the machines built on with shared memory architecture is that they are easily implemented. These machines though are rather used in high-level, than in low-level image processing. Although such machines are applicable to parallel image processing, where each PE is devoted to a certain particle of the image, they are not as sufficient as SIMD. Due to the synchronization that should be applied on the PEs, the system becomes time-consuming. However, many structures adopt cash memory in each PE, in order to enhance their speed. 



Figure 14.Shared memory architectures: (a) fast bus and (b) multi-layer interface network.


Abbott L, Haralick R M and Zhuang X (1988) Pipeline Architectures for Morphologic Image Analysis, Machine Vision and Applications, 1 (1): 23-40.


Andreadis I, Gasteratos A and Tsalides P (1996) An ASIC for Fast Grey-Scale Dilation, Microprocessors and Microsystems, 20 (2): 89-95.


Bloch I and Maitre H (1995) Fuzzy Mathematical Morphologies: A Comparative Study, Pattern Recognition, 28 (9): 1341-1387.


Botha E C and Casasent D P (1989) Applications of Optical Morphological Transformations, Optical Engineering, 28 (5): 501-505.


Broggi A, Conte G, Gregoretti F, Sansoe C, Passerone R and Reyneri L M (1998) Design and Implementation of the PAPRICA Parallel Architecture, The Journal of VLSI Signal Processing, 19 (1): 5-18.


Broggi A, Conte G, Gregoretti F, Sansoe C and Reyneri L M (1997) The Evolution of the PAPRICA System, Computer-Aided Engineering Journal, 4 (2): 114-136.


Camps O I, Kanungo T and Haralick R M (1996) Grey-Scale Structuring Element Decomposition, IEEE Transactions on Image Processing, 5 (1): 111-120.


Casasent D (1994) General-Purpose Optical Pattern Recognition Image Processors, Proceedings of the IEEE, 82 (11): 1724-1734.


Chen C and Yang D L (1995) Realisation of Morphological Operations, IEE Proceedings: Circuits Devices and Systems, 142 (6): 364-368.


Chen K (1989) Bit-Serial Realizations of a Class of Nonlinear Filters Based on Positive Boolean Functions, IEEE Transactions on Circuits and Systems, 36: 785-794.


Diamantaras I and Kung S Y (1997) A Linear Systolic Array for Real-Time Morphological Image Processing, Journal of VLSI Signal Processing, 17 (1): 43-55.


Diamantaras I, Zimerman K H and Kung S Y (1990) "Integrated Fast Implementation of Mathematical Morphology Operations in Image Processing", IEEE International Symposium on Circuits and Systems, New Orleans, pp 1442-1445.


Dougherty E R and Astola J (1994): Introduction to Non-linear Image Processing, SPIE, Bellingham, Washington.


Duff M J B (1982): CLIP4, in Fu K S and Ichikawa T, eds., Special Computer Architectures for Pattern Processing, CRC Press, Boca Raton, Florida.


Duff M J B (1988) "Some Constrains on the Limitations of Image Processing Computer Architectures", IARP Workshop on CV - Special Hardware and Industrial Application, Tokyo, Japan, pp 1-5.


Duncan R (1990) A Survey of Parallel Computer Architectures, IEEE Computer, 23: 5-16.


Fitch J P (1987) Software and VLSI Algorithm for Generalized Ranked Order Filtering, IEEE Transactions on Circuits and Systems, CAS-34 (5): 553-559.


Fitch J P, Coyle E J and Gallagher Jr. N C (1984) Median Filtering by Threshold Decomposition, IEEE Transactions on Acoustics Speech and Signal Processing, ASSP-32 (6): 1183-1188.


Fitch J P, Coyle E J and Gallagher Jr. N C (1985) Threshold Decomposition of Multidimensional Ranked Order Operations, IEEE Transactions on Circuits and Systems, CS-32 (5): 445 450.


Flynn M J (1966) Very High Speed Computing Systems, Proceedings of the IEEE, 56 (12): 1901-1909.


Fountain T J, Matthews K N and Duff M J B (1988) The CLIP7A Image Processor, IEEE Transactions on Pattern Analysis and Machine Intelligence, 10 (3): 310-319.


Frieder O (1990) Multiprocessor Algorithms for Relational-Database Operator on Hypercube Systems, IEEE Computer, 23: 13-28.


Gader P D (1991) Separable Decompositions and Approximations of Greyscale Morphological Templates, CVGIP: Image Understanding, 53 (3): 288-296.


Gasteratos A and Andreadis I (1999): Soft Mathematical Morphology: Extensions, Algorithms and Implementations, in Hawkes P W, ed., Advances in Imaging and Electron Physics, 110, Academic Press, pp. 63-99.


Gasteratos A and Andreadis I (2000) Non-Linear Image Processing in Hardware, Pattern Recognition, Special Issue on Mathematical Morphology, 33 (6): 1013-1021.


Gasteratos A, Andreadis I and Tsalides P (1996) Improvement of the Majority Gate Algorithm for Grey-Scale Dilation/Erosion, Electronics Letters, 32 (9): 806-807.


Gasteratos A, Andreadis I and Tsalides P (1997a) Extension And Very Large Scale Integration Implementation Of The Majority-Gate Algorithm For Gray-Scale Morphological Operations, Optical Engineering, 36 (3): 857-861.


Gasteratos A, Andreadis I and Tsalides P (1997b) Realization of Rank-Order Filters Based on Majority Gate, Pattern Recognition, 30 (9): 1571-1576.


Gasteratos A, Andreadis I and Tsalides P (1998a) Fuzzy Soft Mathematical Morphology, IEE Proceedings: Vision Image and Signal Processing, 145 (1): 40-49.


Gasteratos A, Andreadis I and Tsalides P (1998b) Realisation of Soft Morphological Filters, IEE Proceedings - Circuits Devices and Systems, 145 (3): 201-206.


Gasteratos A, Andreadis I and Tsalides P (1998c) "Soft Morphological Structuring Element Decomposition", International Symposium on Mathematical Morphology '98, Amsterdam, The Netherlands, pp 407-413.


Gerritsen F and Aardema L G (1981) Design and Use of DIP-1: A Fast Flexible and Dynamicaly Microprogrammable Image Processor, Pattern Recognition, 14 (1): 1-6.


Giardina C R and Dougherty E R (1988): Morphological Methods in Image and Signal Processing, Prentice-Hall, Englewood Cliffs, New Jersey.


Haralick R M and Shapiro L G (1992): Computer and Robot Vision, Addison-Wesley, New York.


Haralick R M, Somani A K, Wittenbrink C, Johnson R, Cooper K, Shapiro L G, Phillips I T, Hwang J-N, Cheung W, Yao Y H, ChenC-H, Yang L, Daugherty B, Loberski B, Loving K, Miller T, Parkins L and Soos S (1995) Proteus: a reconfigurable computational network for computer vision, Machine Vision and Applications, 8 (2): 85-100.


Haralick R M, Sternberg S R and Zhuang X (1987) Image Analysis Using Mathematical Morphology, IEEE Transactions on Pattern Analysis and Machine Intelligence, 9 (4): 532-550.


Heijmans H J A M and Roerdink J B T M, eds., (1998): Mathematical Morphology and its Applications to Image and Signal Processing, Kluwer Academic Publishers, Dordrecht-Boston-London.


Hwang K and Briggs F A (1987): Computer Architecture and Parallel Processing, McGraw-Hill, Singapore.


Jonker P P (1992): Morphological Image Processing : Architecture and VLSI Design, Kluwer Academic Publishers, Amsterdam, The Netherlands.


Klein J C and Serra J (1977) The Texture Analyzer, Journal of Microscopy, 95: 349-356.


Ko S J, Morales A and Lee K H (1995) A Fast Implementation Algorithm and a Bit Serial Realization Method for Greyscale Morphological Opening and Closing, IEEE Transactions on Signal Processing, 43 (12): 3058-3061.


Kojima S and Miyakawa T (1996) One-Dimensional Processing Architecture for Gray-Scale Morphology, Systems and Computers in Japan, 27 (12): 1-9.


Koskinen L, Astola J and Neuvo Y (1991) "Soft Morphological Filters", Proceedings of SPIE Symposium on Image Algebra and Morphological Image Processing, 262-270.


Kuosmanen P and Astola J (1995) Soft Morphological Filtering, Journal of Mathematical Imaging and Vision, 5 (3): 231-262.


Lin R and Wong E K (1992) Logic Gate Implementation for Gray-Scale Morphology, Pattern Recognition Letters, 13: 481-487.


Luck L and Chakrabaty C (1995) A Digit-Serial Architecture For Gray-Scale Morphological Filtering, IEEE Transactions on Image Processing, 4 (3): 387-391.


Mano M M (1991): Digital Design, Prentice-Hall, Englewood Cliffs, NJ.


Maragos P, Schafer R W and Butt M A, eds., (1996): Mathematical Morphology and its Applications to Image and Signal Processing, Kluwer Academic Publishers, Dordrecht-Boston-London.


Park H and Chin R T (1994) Optimal Decomposition of Convex Morphological Structuring Elements for 4-Connected Parallel Processors, IEEE Transactions on Pattern Analysis and Machine Intelligence, 16 (3): 304-313.


Park H and Chin R T (1995) Decomposition Of Arbitrarily Shaped Morphological Structuring Elements, IEEE Transactions on Pattern Analysis and Machine Intelligence, 17 (1): 2-5.


Petkov N (1993): Systolic Parallel Processing, North-Holland, Amsterdam, The Netherlands.


Pitas I and Venetsanopoulos A N (1990): Nonlinear Digital Filters: Principles and Applications, Kluwer Academic Publishers, Boston, Massachusetts, U.S.A.


Preston K J, Duff M J B, Levialdi S, Norgen P and Toriwaki J I (1979) Basics of Cellular Logic With Some Applications in Medical Image Processing, Proceedings of the IEEE, 67: 826-857.


Pu C C and Shih F Y (1995) Threshold Decomposition of Grey-Scale Soft Morphology into Binary Soft Morphology, CVGIP-Graphical Models and Image Processing, 57 (6): 522-526.


Reeves A P (1984) Parallel Computer Architectures for Image Processing, Computer Vision, Graphics and Image Processing, 25 (1): 68-88.


Robin F, Renaudin M, Privat G and v.d. Bossche N (1996) Functionally Asynchronous Array Processor for Morphological Filtering of Greyscale Images, IEE Proceedings: Computers and Digital Techniques, 143 (5): 273-281.


Rosenfeld A (1983) Parallel Image Processing Using Cellular Arrays, IEEE Computer, 16: 14-20.


Schmitt L A and Wilson S S (1988) The AIS-5000 Parallel Processor, IEEE Transactions on Pattern Analysis and Machine Intelligence, 10 (3): 320-330.


Serra J (1982): Image Analysis and Mathematical Morphology, Academic Press, London.


Serra J (1986) Introduction to Mathematical Morphology, Computer Vision, Graphics, and Image Processing, 35 (3): 283-305.


Serra J, ed., (1989): Image Analysis and Mathematical Morphology: Theoretical Advances, Academic Press, London.


Serra J and Soille P, eds., (1994): Mathematical Morphology and its Applications to Image Processing, Kluwer Academic Publishers, Dordrecht-Boston-London.


Shih F Y and Mitchell O R (1989) Threshold Decomposition of Gray-Scale Morphology into Binary Morphology, IEEE Transactions on Pattern Analysis and Machine Intelligence, 11 (1): 31-42.


Shih F Y and Mitchell O R (1991) Decomposition Of Grey-Scale Morphological Structuring Elements, Pattern Recognition, 24 (3): 195-203.


Shinha D and Dougherty E R (1992) Fuzzy Mathematical Morphology, Journal of Visual Communication and Image Representation, 3 (3): 286-302.


Singh B V and Siddiqi M U (1996) A Simplified Algorithm for Approximate Separable Decomposition of Morphological Templates, Pattern Recognition, 29 (9): 1519-1522.


Siskos S, Vlassis S and Pitas I (1998) Analog Implementation of Fast Min-Max Filtering, IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 45 (7): 913-918.


Soille P (1999): Morphological Image Analysis: Principles and Applications, Springer-Verlag, Berlin, Germany.


Urahama K and Nagao T (1995) Direct Analog Rank Filtering, IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 42 (7): 385-388.


Vlassis S, Siskos S and Pitas I (1998) "Analog Implementation of an Order Statistics Filter", 9th Mediterranean Electrotechnical Conference, MELECON 98, 649 -653.


Wendt P D, Coyle E J and Gallagher Jr. N C (1986) Stack Filters, IEEE Transactions on Acoustics Speech and Signal Processing, ASSP-34: 898-911.


Xu J (1996) Morphological Decomposition of 2-D Binary Shapes into Simpler Shape Parts, Pattern Recognition Letters, 17 (7):759-769.


Zhuang X and Haralick R M (1986) Morphological Structuring Element Decomposition, Computer Vision, Graphics and Image Processing, 35 (3): 370-382.