This paper appears in the Proceedings of SimCat97, the Interdisciplinary Workshop on Similarity and Categorisation, M. Ramscar and U. Hahn (eds.) Dept. of Artificial Intelligence, Univ. of Edinburgh. ISBN 0 907330 27 4.

Similarity and Depiction

John Lee
Human Communication Research Centre
and EdCAAD, Dept. of Architecture
University of Edinburgh
2 Buccleuch Place
Edinburgh EH8 9LW

J.Lee@ed.ac.uk

Contents

  1. Introduction
  2. Similarity and Depiction
  3. Natural generativity
  4. Structure mapping and depiction
  5. Similarity again
  6. Conclusion
  7. Acknowledgements
  8. References
Abstract

The question is examined of how pictures are to be individuated from other kinds of respresentations. This is discussed in relation to the classic treatment by Goodman, and a further development by Schier. Neither of these is found satisfactory. It is argued that the issue should be seen as interest-relative, and that in formal terms a version of a theory based on structure mapping is the most promising candidate.

Introduction

One area where similarity has always been an important issue is in pictorial representation. Pictures, diagrams and the like have an established place in communication and inference, and are growing in importance as multimedia and other generally graphical techniques are ever more widely adopted. Pictures and diagrams are interesting, however, in that the way they work is relatively poorly understood. A number of theories of depiction and graphical representation have been developed, none of which has yet emerged as clearly satisfactory. But some of the more recent offerings show a clear convergence with theories in other areas such as analogical inference. This suggests that some discussion of pictorial and diagrammatic representations would be of interest to the present workshop, fostering an interdisciplinary exchange and casting a potentially revealing sidelight on some of the other theoretical issues likely to be discussed. The paper also offers a case study of a situation where similarity is key in an unusual way to an issue of categorisation, viz. whether it is a proper semantic basis for distinguishing pictures as a specific type of representation, which remains a controversial issue among various theories in this area.

Similarity and Depiction

Until quite recently, the view went largely unchallenged that the basis of depiction was similarity. It seemed obvious: a picture depicts what it depicts because it looks like it; it resembles it; it is similar to it in point of visual appearance. Indeed the notion of resemblance lay at the heart of the traditional view of representation in general. For most philosophers, mental activity consisted in the entertaining of "ideas", and ideas represented aspects of the world in virtue of their resemblance or similarity to them. These ideas were often quite explicitly discussed in terms of images and pictures, and assumed to derive their semantics -- their intentionality, or ability to refer to things beyond themselves -- in much the same way. When, originally about 80 years ago, Wittgenstein proposed in his Tractatus (1963) a "picture theory of meaning" for language, the difficulties it contained were raised as technical issues in logic, the question of what it was for a picture to have meaning in the first place being not generally seen as problematic.

More recently, however, the basis of these ideas has been brought into question. Wittgenstein himself, in his later work, argued that a representation cannot be seen as standing in some fixed and abstract relation to what it represents; the relation depends on the activities of agents who create and use various things representationally. The agents (typically people, of course) are organised such that their use of representations follows certain rules or conventions, and it is only within such a framework that the notion of representation can be seen to make sense. This idea has a quite natural appeal where language is being discussed; language is easily seen as rule-based and conventional. However, it is more controversial as applied to graphical, especially pictorial representation. To some degree, it is easy to see that certain aspects of particular types of depiction, e.g. the rules of perspective, may have a conventional element; but still there is a very strong disposition to suppose that the fundamental reason why the Mona Lisa is a picture of a woman is simply that it looks like one.

Probably the most influential arguments against resemblance or similarity as the basis of depiction are those presented by Nelson Goodman, in Languages of Art (1968). Goodman observed that no degree of similarity is either necessary or sufficient for depiction, or for representation in general[1]. There are important differences between the properties that these relations exhibit -- e.g. similarity is reflexive, whereas representation is not. Cars coming off an assembly line are all similar to each other, but do not represent each other. If a picture of me is a depiction because of similarity, then surely I must be a depiction of the picture? Moreover, pictures all resemble other pictures more than they resemble anything else. So evidently there must be some additional factor, e.g. conventions about when things are used as representations.

Much of Goodman's discussion emphasises the curious position occupied by the issues of pictorial representation, somewhere between the theory of art and formal semantics. Differences can be pointed out between various properties of these kinds of representations, as compared with linguistic descriptions for example, and the problem is to interpret these differences in terms of an underlying theory of meaning that can apply across representations of a wide range of different kinds. Developing such a theory would have implications in diverse fields such as cognitive science and multimedia computing.

Goodman's line of thought suggests this kind of unifying strategy, in that he treats pictures as a symbol system, among perhaps many other kinds of symbol system, including natural languages. Pictures have a structure, in virtue of which they are related to something that they depict, and this relation Goodman claims to be largely a matter of convention. However, there are still important differences in the kinds of relationship one can consider between the structure and the represented domain. In particular, Goodman distinguishes notational and non-notational symbol systems. This is not an easy distinction to characterise. Rather than do so in strictly formal terms, it's useful to point out some of the correlative differences in features or uses of the representations, since this will help us to see on what kinds of grounds representations might be separately categorised. One such issue that Goodman discusses is forgery: a painting, he says, can be forged, whereas a literary text cannot. A forgery of the Mona Lisa should look as much like the original as possible; but even a letter-for-letter exact copy of Finnegan's Wake is no forgery, being simply another copy of the book. Paintings, prints, sculptures and the like are, according to Goodman, autographic, whereas texts, musical works and such are allographic. In the first kind of case, the history or aetiology of the object has to be taken into account in identifying it, whereas in the second what matters is that the reproduction conforms to the original.

As part of the groundwork for developing a theory of depiction that will be considered in the next section below, Flint Schier (1986) argues that Goodman has overlooked two important points here. To begin with, it's logically possible that a text identical to a given novel, say, could come into existence without having any causal connection with the novel (it might be typed by monkeys, or whatever); and it would then be at best questionable to say that it was a copy of that novel, showing that novels are to some degree identified by aetiology. This helps us to see that forgery involves a false claim: a copy that is intentionally very like a given Vermeer only becomes a forgery when it is claimed to be authored by Vermeer, and on the other hand it would be indeed incoherent to treat as false a claim that a meticulous copy of Finnegan's Wake was a work by Joyce.

Even as elaborated by Schier, this distinction seems to me to be on shaky ground. We speak naturally of a given volume being a copy of a novel; somewhere, perhaps, there survives the "autograph edition", much as the autograph original of the Mona Lisa hangs in the Louvre, and if one forged the novel one would have to be forging the autograph just as when forging a painting. The technology of image reproduction being what it now is, it seems fairly plausible to suggest that a copy of the Mona Lisa is actually in a very parallel situation with a copy of Finnegan's Wake. I can claim falsely to be the author of the original of either; a true claim to authorship will involve aetiology in either case. In fact, of course, forgery relatively rarely involves the exact copying of an existing work (especially one that's well known) -- and it's as easy to claim falsely that a text is an undiscovered novel by Joyce as that a painting is an undiscovered work by Da Vinci.

Schier argues further that notational works can be plagiarised, whereas autographic works cannot, because plagiarism involves copying something and falsely claiming that the result is your own work. If you copy a painting, he says, the result just is your own work. This however seems also dubious. The result is an autograph edition of a painting by you, but it's not clear why this exonerates it from the charge of plagiarism any more than copying out a novel in a pleasingly calligraphic hand would do. Like forgery, plagiarism is not usually exact copying: the stealing of an idea and its uncreative and unacknowledged re-use is something that can happen in any medium.

The point of this discussion, which the reader may by now be tempted to regard as a digression, is twofold. It emphasises on the one hand how far the basis of distinctions has now removed itself from the original question of similarity, and on the other hand it shows the difficulty, once similarity is dropped, of deriving a convincing ground for distinction. For, surely, some ground for distinction is still needed. Despite all the above, we intuitively do want to say that the relation between a picture and its object is not just the same as that between a descriptive paragraph and its object. If the distinction cannot be safely drawn along the autographic/allographic line, where can it be drawn?

Natural generativity

One of Goodman's central objectives is to abolish the distinction between iconic and non-iconic representation, whereas Schier (op. cit.) is determined to rehabilitate it. Goodman holds that there is no principle on which to base the distinction: both iconic and non-iconic representations are essentially conventional, if possibly non-notational. Schier therefore introduces a principle, one that brings similarity back into the centre of the discussion. This principle he calls natural generativity. According to Schier, people have a recognitional capacity that relates to physical objects and extends to certain kinds of representations. An individual object can naturally be recognised if it disappears and then reappears; also objects can be recognised as being of certain kinds. Similarly, if an object can be recognised, then certain representations can naturally be recognised as being representations of that object. It's admitted that this may have to be taught -- there are empirical studies showing e.g. that photographs may not be recognised as pictures in some cultures -- but then, Schier says, if someone is taught that one or two photographs are pictures of given objects, then he will thereafter recognise a photograph (taken in normal conditions, etc.) of any object that he can recognise in itself. So there is an element of convention, but interpretation of the representation depends crucially on ability to recognise the object: this is natural generativity, and any representation which depends on it is, in Schier's view, an iconic, and indeed a pictorial representation. It seems obvious that this recognitional ability must be based, if among other things, on similarity between representation and represented.

Schier's objective here is to capture the notion of a pictorial representation, and this includes, for instance, ruling out various kinds of analogical representations such as graphs and many cases of diagrams. Goodman (op. cit.) had proposed that this could be done on syntactic grounds. He suggested that iconic symbol systems would have to be syntactically "dense"[2], which means that any difference in the properties of the representation could be relevant to its classification as this or that symbol (unlike the case with different realisations of the character "d", for instance -- "d", "d", etc.). He also suggested that they would have to be relatively "replete", where a representation is more replete than another if more of its properties are relevant to its classification (and hence interpretation); and the more replete an iconic representation, the more pictorial it would be. Schier counters that one could have e.g. graphs where colour is relevant, and these are not more pictorial, albeit more replete, than those where colour is not. Also, scant line drawings, he says, may be just as pictorial as complicated paintings, though clearly less replete. Indeed, in Goodman's own comparison of a Hokusai drawing of Fujiyama with an electrocardiogram, all the qualities of the drawing, including minute characteristics of the line and even the quality of the paper, are said to be potentially relevant (presumably to the identity of the drawing conceived as a symbol), though it seems clear that they need not all be pictorially relevant. There is also no apparent basis here e.g. for excluding abstract paintings as pictorially representational, even in Goodman's sense, given that they may be highly replete; but Schier would certainly want to exclude them.

So Schier is adopting a fairly strict notion of what counts as depiction, which has much to do with some perceptual similarity between the representation and the object; and this is what natural generativity is supposed to capture. But does even this criterion really select the class of representations that Schier wants? It seems to me that one can imagine cases like the following. A mathematician is brought up using only algebraic methods, through which he becomes familiar with various functions such as y = x2. Being shown an appropriately curved graph, he sees, perhaps quite spontaneously (or else with some minimal explanation), that it indeed represents this function. Moreover, on seeing further curves he can identify the functions they represent, and also draw graphs for yet other functions, and so on. This seems to be a case of natural generativity, which therefore should indicate that the curves depict the functions -- they represent them pictorially. To do this, the mathematician is at least implicitly grasping the conventions that underly Cartesian geometry, since even if the curves may have no specific scale their relationship to the axes etc. must be appropriately understood. However, it's not clear that this is radically different from understanding the conventions of perspective, or other less controversially pictorial classes of representations. Schier might argue that generativity based on skills in mathematics is not exactly what he had intended by natural; but it is conceded that some learning may be required (e.g. by the native unfamiliar with photographs), and there's no apparent principle for defining a limit to this. Natural generativity as a principle is subject to a host of empirical questions about its extension by education, and perhaps its restriction by illness or other effects. It therefore appears that, if perhaps Schier should decide to accept the graph as a depiction of a function, we can go on to consider other cases, clearly diagrammatic in his terms, that would simply need a little more learning.

What this suggests is that Schier's criterion for pictoricity will if pushed actually collapse into something that identifies all analogical representations. That's to say, it will identify all representations whose relation to their object is based on some mapping that can be described on the basis of a small set of examples, and then directly extended to the other cases where it applies. Whether this extension is "natural" cannot be clearly enough defined to individuate a distinct class of representations. But now we notice that a type of relation having just the mentioned property is that which has become generally known as a structure mapping.

Structure mapping and depiction

A notion popularised by Gentner (1983) in developing an account of analogical representations used in reasoning, "structure mappings" are in fact also ubiquitous in accounts of metaphor, graphical representation and even representation in general. Wittgenstein's Tractarian account of semantics for language, mentioned earlier, while often referred to as a "picture theory" is actually a quite explicit attempt to describe a relation between the logical structure of language and the "logical structure of the world". The parallel between this and the way that Wittgenstein thought pictures get their meaning is clearly shown in Fig. 1, one of Glen Baxter's immortally surreal cartoons, where Tex's utterance in the caption is in fact a quotation[3] from the Tractatus (2.17). An added twist is, of course, wickedly imparted to this cartoon by Tex's remark being associated with a clearly abstract painting, of a kind that many (at least Schier) would deny to be pictorial at all.

Figure 1

In recent accounts of graphical semantics, such as discussed for example by Wang, Lee and Zeevat (1995), Gurr (1997), or Shimojima (1996), the structural mapping is usually defined quite explicitly, and may be as direct as a "signature morphism" between algebraic descriptions of the structures, or it may fail in various ways to be a true homomorphism. Goodman's notions of density and repleteness are interesting here. Introduced as syntactic notions, and discussed in terms of the identity criteria for symbols, it is perhaps more revealing to look at what this means for semantics. To distinguish two things as distinct symbols is, at least, to indicate that they may bear distinct interpretations. Density thus implies that any alteration to the representation is potentially significant; any of its parts and properties may be interpreted. Repleteness is in some sense a measure of the extent to which they actually are interpreted. This in itself leaves obscure how far it constrains the relationship between structure in the representation and structure in the object, which can be preserved to varying degrees -- parts may map to parts, properties to properties, relations to relations, properties of relations to properties of relations, etc. The details of these issues are often critical when representations are used for reasoning, as recognised by Gentner et al's introduction of a "systematicity principle" in analogical mappings, and various properties identified for graphical mappings, such as Wang at al's "conservativity".

Critical for reasoning, perhaps -- but what about depiction? Do these reflections help us derive a response to Schier's demand for a principle that distinguishes pictures from other kinds of analogical representations? Or is it, as Goodman suggests, essentially a matter of degree?

Similarity again

It appears that in either case, some concept that is basically one of similarity will have a role to play. We naturally feel that if two things share all their properties, then they are identical; the less they share, the more different they become. For at least some range of possibilities they remain similar, even if it's not very clear at what point it might be necessary to say they are similar no more. But this is less straightforward than it seems. Properties are commonly grouped into classes, so that we can call things similar in some respects, but different in others. Any two things will always be similar in some respect, and therefore (as indeed Goodman himself argued), the notion of similarity as such does no work.

However, in the case where a formal semantics is being developed, it becomes explicit that one begins with an abstraction. Whatever is to be represented must be conceived of as a domain of objects, properties and relations, which inevitably implies some form of selection from the indefinitely large range of such characterisations that could be chosen. Given this, it makes sense to speak, as Gurr does (op. cit.), of the structural mapping as being potentially an isomorphism. This only makes sense under selection (or, what amounts to the same thing, partitioning of the properties etc. of the represented domain into equivalence classes), which gives us enough purchase to consider how many or what classes of properties are represented. We can distinguish, e.g., between representations of the topology of some domain, and ones which also include metric properties, perhaps in different ways. A purely topological representation of the Earth, for example, could be a sphere with several circles and toruses, etc. on its surface: this represents the number and connectivity of the regions that can be distinguished, but of course it is of little practical use. A fully geometric representation would be like the familiar globe. But of course, as is well known, if the geometry of the sphere is mapped to a plane surface some properties will have to be neglected, and one can choose which. Fig. 2 shows the effects of neglecting distances, on the one hand, and areas on the other. Both of these choices are useful, for different purposes.

Figure 2

Now, there is a natural inclination to say that the globe is more like the Earth than is either of the planar maps; and that these are more like it than the topological map. This is because we start with an abstracted notion of the Earth which treats certain of its geometric properties as relevant (though an indefinte number of its properties, e.g. its weight, and perhaps metric properties of relief features such as mountains, are ignored). We might be inclined to say that the globe models the Earth, and that the two planar maps depict its surface, reflecting the usual restriction of depiction to 2D representations. If the maps depict the Earth, this is because they represent (under structure mapping) enough of the information about the Earth to satisfy our present interests. The topological globe, we might say, models the topological structure of the Earth, or a planar projection of such a globe might depict that structure: we are not happy to say simply that it depicts the Earth, because in general we are interested in more properties of the planet than these. But in other circumstances we will be quite happy to say that a featureless disc depicts the Earth, e.g. if we are concentrating on its relations with other planets.

Gentner and various colleagues have devoted considerable effort to elaborating the nature of structure mapping as a cognitive process, and in particular they have exposed the importance of looking at the depth of the structure being mapped. Markman and Gentner (1993), for example, show that where people are asked about similarities between pictures their responses will often derive from a relational structure much deeper than surface similarity. Scenes in which different objects are similarly related may be rated more similar than scenes that involve the same objects but do not otherwise share structure. This is highly dependent on the instructions given to the subjects, in that if they are simply asked which picture "goes with" another, they will typically respond on the basis of which objects are involved. Object mappings (or similar "surface similarities") seem nonetheless to be important in depiction: to borrow one of Markman and Gentner's examples, we will not say that a drawing of a man delivering groceries to a woman can function as a depiction of a woman feeding a squirrel, even though it can be mapped to the latter or used as an analogy for it. It is perhaps stretching the point to suggest that we might view both as ways of depicting the abstract relation of providing food -- even if Markman and Gentner's subjects appear naturally to generate this interpretation, it seems just to be a description that allows us to see the two as similar. But what are we calling similar? It is hard here to disentangle talking about similarity between pairs of pictures, pairs of depicted situations, and pictures and situations.

These particular experiments do not suffice to establish that judgements are conditioned by interest. They seem perhaps consistent with that view, though Markman and Gentner, working inevitably with rather simple stimuli, tend to write as though relational structure is somehow determinate. With Indurkhya (1992), we wish here to emphasise the essential creativity involved in the establishment of the structure that is going to be mapped, which is individual and conditioned by many factors specific to the perceiver in a given situation, interests being simply among the less subtle of these.

Figure 3

Such considerations seem to apply also to cases we have mentioned above. A scant line drawing (such as the Haro cartoon in Fig. 3) can be a depiction as much as a full-colour photograph can, if we are able to find a structural description of the drawing and the object that allows a mapping to the object that satisfies our present interests; otherwise we'll be likely to say it depicts the outline of the object, etc. Here, indeed, it appears that depiction, similarity and recognisability all come together as interest-relative notions: an image will only be recognised as an X, or held similar to an X, by one who is conceiving of an X in the right sort of way and at somewhere near the appropriate level of abstraction (or who is able to adopt that point of view for the purpose).

Conclusion

This has been barely a sketch of an argument, but its conclusion is that the question of whether something is a pictorial representation should be seen as an interest-relative issue, not susceptible of definitive answer. In this respect, it indeed behaves like the question of whether the representation is similar to the object, though the latter notion has no power to explain the former. Recognisability, even in the guise of natural generativity, also seems only contingently to fall into line with these other notions.

Figure 4

Consider, as a final example, Fig. 4. This is an anamorphic "picture", which appears with its proper geometry only under reflection in a cylinder. Is the image on the flat surface a picture? In this case, it is probably recognisable, but suppose it were further distorted? It may well be that our intuitions about whether the image is a picture coincide at least roughly with our ability to recognise what it depicts, but this seems a poor basis on which to make a sharp distinction of kind between two images, one we can just recognise and one we just cannot, when we can clearly recognise both under reflection in suitably shaped mirrors. Recognisability is at best a fair guide to when it is feasible in practice to use an image as a pictorial representation.

Acknowledgements

The Human Communication Research Centre is an interdisciplinary research centre funded by the UK Economic and Social Research Council. The author is indebted to Mike Ramscar for many discussions on similarity and analogy, and to an anonymous referee for useful sugestions.

References

Gentner, D. (1983) Structure-mapping: a theoretical framework for analogy. Cognitive Science, 7: 155-170.

Goodman, N. (1969) Languages of Art. Oxford University Press.

Gurr, C. (1997) On the Isomorphism, or Lack of it, of Representations. In Theory of Visual Languages, K. Marriot and B. Meyer eds. Springer Verlag.

Indurkhya, B. (1992) Metaphor and Cognition. Kluwer academic publishers.

Markman, A.B. and Gentner, D. (1993) Structural alignement during similarity comparisons. Cognitive Psychology 25: 431-467.

Schier, F. (1986) Deeper into Pictures. Cambridge Uinversity Press.

Shimojima, A. (1996) On the efficacy of representation. PhD dissertation. Indiana University.

Wang. D., Lee, J. and Zeevat, H. (1995) Reasoning with diagrammatic representations. In Diagrammatic Reasoning: Cognitive and Computational Perspectives. J. Glasgow, N. H. Narayanan & B. Chandrasekaran eds. MIT Press.

Wittgenstein, L. (1963) Tractatus Logico-Philosophicus. Trans. D.F. Pears and B.F. McGuinness.