Traditional computer vision systems attempt to recognise static objects and their dynamic behaviour using bottom-up, data-driven processing with minimal use of prior knowledge. However, such systems are bound to fail for complex domains as the information in the images alone is insufficient for detailed interpretation or understanding of the objects and events. Knowledge-based vision research relies primarily on scene context to overcome this kind of uncertainty. For example, Strat and Fischler [60,61] combine many simple vision procedures that analyse colour, stereo, and range images with relevant contextual knowledge to achieve reliable recognition. There are many other types of contextual knowledge such as functional context , where attributes such as shape are used to infer the functional role of the object and direct the visual processing . Another type of context, which is particularly relevant to multimodal and multimedia systems, is linguistic context [55,62]. In addition, task context is an important source of control for the visual processing [13,20]. The role of context, then, is central to visual interpretation and understanding, and representing context in an appropriate way to improve the effectiveness and efficiency of visual reasoning is a key issue in the field.