Manny Rayner, David Carter, Pierrette Bouillon, Vassilis Digalakis and Mats Wiren, editors. The Spoken Language Translator. Cambridge University Press. 2000. ISBN 0-521-77077-7 Price $70 (hardback). xviii+337 pages With the turning of the century, it has become increasingly clear that core language technologies, such as statistical parsing or tagging, have largely reached a plateau in performance. For example, new papers on parsing tend to deal with peripheral concerns, such as porting to new domains or else highlighting new machine learning techniques. When improvements in parsing performance as reported, they tend to be very modest indeed. On the basis of such observations, one could conclude that many language engineering tasks are solved problems. However, as anyone who has actually used these `converged' technologies will say, there is still room for improvement. Now, there are a number of possible reasons for this (apparent) convergence (and hence impasse). With only so much expensively annotated training material, there is a ceiling on what existing statistical systems can achieve, and we have by-and-large reached that limit. Yet another possible reason for a slow-down in performance improvement is that our understanding of how language works is still very rudimentary, and what little we do know about language is further restricted by the need to fit models to the available data. It therefore seems timely that we revisit those largely rule-based systems that are currently unfashionable and see whether we can salvage anything from them. In particular, clever use of the knowledge present within those systems, augmented with probabilistic components, seems to be our main hope of pushing forwards progress in language technology. In order to make this next step, we need to understand what people used to do in the days before statistics took over. Enter this book, which can be seen as a very accessible account of a mainly rule-based system for translating spoken language. The book is divided into four parts: the natural language processing aspects of the system, a discussion of the linguistics of the task, the speech processing techniques used and finally evaluation. In total, there are $21$ chapters. Since there are so many chapters, I will instead tackle the book in terms of the four parts. The introduction to the book summarises the system (which largely consists of technology developed for the earlier Core Language Engine coupled with the SRI DECIPHER speech recognition system). A nice aspect of this chapter are the numerous examples used to illustrate the translation process. The authors seem to be aware that, when they were developing the system, events were overtaking them elsewhere: they have an explicit defence of using rule-based grammars. The evaluation of their system presented here, and expanded in the final part, is sober and well-balanced. The first part of the book firstly explains how Quasi Logical Forms (QLFs) can be used as the transfer formalism used in their machine translation process. Numerous examples aide the reader in understanding what is happening. There is then a description of their application of Explanation-based Learning (EBL) to grammar specialisation. EBL is a version of speed-up learning, where the idea is to transform a theory into another version whereby some of the intermediate steps are collapsed into macros. Within the context of grammars, it can speed-up parsing as few rules to be applied. Since their grammar-based system, just like any other system, has to deal with ambiguities, a chapter discusses how they select amongst alternative parses. Their approach is somewhat ad-hoc, being based around `discriminants' (predicates for making choices at points of ambiguity). With hindsight, a log-linear approach, which is now the standard way of modelling parse selection within feature-based grammars, would have been more appropriate. Creating annotated training material or adding new linguistic rules is always a pressing concern, and so there is a discussion of tools for helping create training material for the discriminants and for adding new lexical items to the system. The first part concludes with material on morphology and on creating training material for the task. Part two, which deals with the linguistic resources used, focuses largely on the coverage of English, French and Swedish manually written grammars. It also considers relationships between these grammars. Since the grammars do have to cover real-world language, they deal with aspects of language not normally found in linguistic textbooks, such as dates. These chapters only sketch the grammars, and the interested reader wanting more details will need to revisit earlier descriptions of the grammar used in the Core Language Engine. One missing part here is an estimate of how long it took to develop the English grammar. Whilst developing grammars for related languages may be quick given the existence of such a previously constructed grammar (as the authors argue in a chapter on bootstrapping one grammar from another one) this ignores the start-up cost. Part three, which deals with the speech recognition aspects of the project, starts with a review of the basic technologies used (HMMs). It discusses the important problem of making the system operate in near real-time. and then goes on to consider language modelling within the context of MT. One interesting section discusses the possibility of having a manually-written grammar generate strings with which to train a statistical language model. There is also a brief mention of using a language model for treating speech containing code-switching (utterances containing a mixture of languages). As with the previous part, there is a discussion of the porting problem (this time, that of porting a speech recogniser to a new language). The problem of adapting to new dialects is considered. This part ends with a general chapter considering various odds-and-ends of their system. The final part tackled system evaluation. As then as now, this continues to be a vexed problem. For example, it is hard to have an objective definition of what it means for translation to be poor. Here, humans are used to make judgements regarding the degree of acceptability of the translation. A large number of tables summarises various lesion studies (results after crippling the full system in various ways). A minor quibble is that because the tables are spread over a number of pages, it requires some work to make comparisons. Because the authors carry out full-system evaluation, they are able to observe whether component-level errors multiply, or else whether errors in one module can be compensated by some other module. This therefore tests the hypothesis that the overall error rate of the system can be computed on the basis of the error rates of the individual components. Their findings suggests that at times, errors made by one component can be cancelled by another component. This means that the error rate of the overall system would be overestimated if it were computed as the product of the individual error rates. The book ends with reflective comments on the success (or otherwise) of their system. Refreshingly, they do not claim stellar performance, nor that automatic speech recognition has been solved by their work. Instead, they suggest that their book should be seen as being a well-documented account of a large speech and language project. This is indeed the case. In summary, the book is very well written and structured. There are many lessons here for subsequent generations of speech and language researchers. Whilst the technology described may no longer be state-of-the-art (for example, statistical machine translation has probably replaced transfer-based translation), many of the problems discussed -- such as porting to a new domain, or creating new annotated training material -- will always be present. The book itself is not quite self-contained, and would best be read in conjunction with the earlier Core Language Engine text. That aside, it would be a good primer for anyone wishing to develop a serious speech or language processing system. Miles Osborne University of Edinburgh Institute for Communicating and Collaborative Systems 2 Buccleuch Place Edinburgh EH8 9LW UK miles@inf.ed.ac.uk