Message-passing Systems

Message-passing MIMD systems were developed as a means of overcoming two fundamental problems associated with systems in which a number of parallel processors are connected by means of a common shared store. Firstly there is the difficulty of providing adequate memory bandwidth to support large numbers of processors, all of which could in principle be contending for the same memory module. Secondly, where processors attempt to coordinate their activities through synchronisation variables held in common memory, the inefficiencies due to processors idling in a tight loop, and the saturation of vulnerable links in the processor-memory network can lead to poor performance.

Sophisticated solutions to these problems have been tried, for example the fetch & add operators in the NYU Ultracomputer [1] and combining switches in the IBM RP3 design [2]. However, these techniques inevitably introduced additional hardware complication and expense.

A radically different approach to MIMD processing is needed if these fundamental problems are to be avoided, and perhaps the most natural alternative is simply to design systems in which processors do not share variables. An immediate consequence of enforcing such a rule is that there is no requirement for generally accessible shared memory, and the attendant difficulties are therefore avoided. However, if the facility for sharing variables is removed, some other mechanism for passing values between processes must be provided. The key to this alternative communication mechanism is the message-passing paradigm, used for many years by multiprocessing operating systems running on single processors long before its application in multiprocessor systems.

In a message-passing architecture processors communicate by sending and receiving messages. The processors in such systems normally operate asynchronously, and so the transfer of information requires the sending and receiving processes to synchronise. As a rule, for two processes to communicate one must perform a Send_Message operation and the other must perform a Receive_Message operation. If the actual times at which these operations are initiated are tS and tR respectively then, if tS < tR the sending process must wait for the receiving process to catch up, and if tS > tR the receiving process must wait. |tS - tR| is referred to as the wait-time associated with a communication event, and its value is clearly dependent on the temporal behaviour of the application and the speed with which messages are transferred from process to process.

In a shared memory multiprocessor the link between two cooperating processes is effectively the address of the shared variable(s) through which they communicate. In a message-passing system the link between cooperating processes exists in the form of a naming convention within the Send_Message and Receive_Message operations, and here two alternatives are possible. An obvious naming convention would be for each message-passing operation to name explicitly the partner process (and/or the processor on which it resides) for that operation. For example, assuming that processes P1 and P2 exist, a message could be sent from P1 to P2 by the execution of the following code.

Process.1 Process.2
    :     :
    Send(P2,message)         Receive(P1,message)
    :     :

An alternative naming convention can be implemented by directing messages through named channels. In this case, for two processes to communicate, they must both quote the same channel identifier in their respective message-passing operations as follows.

Process.1 Process.2
    :     :
    Send(chan_X,message)         Receive(chan_X,message)
    :     :

When contemplating message-passing systems from a theoretical viewpoint it is usually sufficient to consider processes, channels and communication operations as existing without reference to any specific implementation restrictions. In practice however, this is an over-simplification. The general structure of a message-passing multiprocessor system is depicted in the figure, from which it can be seen that there are two primary components; the processing elements (PEs) and the message transfer system (MTS).

Consider a system in which there are n processors, and m application processes. There are likely to be many circumstances under which m > n, and so each processor has to provide a large (but necessarily finite) number of virtual processors to which these processes can be mapped directly. By multiprogramming a number of virtual processors on each physical processor, the wait-time experienced by each virtual processor during inter-process communication can be overlapped with other useful processing on that physical processor. Whilst these techniques have been used in single-processor systems for many years, it is important to address the implications of multiprogramming for the message transfer system. For example, with both naming strategies mentioned above it is possible for communication events to occur between processes (effectively virtual processors) which reside on the same or different physical processors, and the MTS protocol must deal with both of these situations.