I'm interested in computational systems that learn from data how to solve real problems. A major problem is human communication: although technology has broken down communication barriers of all kinds all over the world, the biggest barrier remains the fact that humans speak many different languages. My research centers on algorithms and models for statistical natural language processing systems, and I draw on diverse ideas from (mostly) computer science to build these systems.
Professionally, I am a postdoctoral research fellow working with Philipp Koehn in the machine translation research group. I earned my Ph.D. in computer science at the University of Maryland, where I worked with Philip Resnik. I've had the good fortune to collaborate with many excellent researchers, including Abhishek Arun, Michael Auli, Phil Blunsom, David Chiang, Chris Dyer, Barry Haddow, Hieu Hoang, Rebecca Hwa, Nitin Madnani, Christof Monz, Mike Nossal, and Michael Subotin.
Publications
-
A Unified Framework for Phrase-Based, Hierarchical, and Syntax-Based Statistical Machine Translation.
Hieu Hoang, Philipp Koehn, and Adam Lopez.
In
Proceedings of IWSLT,
December 2009.
[abstract]
Despite many differences between phrase-based, hierarchical, and syntax-based translation models, their training and testing pipelines are strikingly similar. Drawing on this fact, we extend the Moses toolkit to implement hierarchical and syntactic models, making it the first open source toolkit with end-to-end support for all three of these popular models in a single package. This extension substantially lowers the barrier to entry for machine translation research across multiple models.
-
Monte Carlo inference and maximization for phrase-based translation.
Abhishek Arun, Chris Dyer, Barry Haddow, Phil Blunsom, Adam Lopez, and Philipp Koehn.
In
Proceedings of CoNLL,
June 2009.
[abstract]
Recent advances in statistical machine translation have used beam search for approximate NP-complete inference within probabilistic translation models. We present an alternative approach of sampling from the posterior distribution defined by a translation model. We define a novel Gibbs sampler for sampling translations given a source sentence and show that it effectively explores this posterior distribution. In doing so we overcome the limitations of heuristic beam search and obtain theoretically sound solutions to inference problems such as finding the maximum probability translation and minimum expected risk training and decoding.
-
Translation as Weighted Deduction.
In
Proceedings of EACL,
March 2009.
[abstract]
[errata]
[slides]
We present a unified view of many translation algorithms that synthesizes work on deductive parsing, semiring parsing, and efficient approximate search algorithms. This gives rise to clean analyses and compact descriptions that can serve as the basis for modular implementations. We illustrate this with several examples, showing how to mechanically develop search spaces using non-local features, novel models, and a variety of disparate phrase-based strategies. Although the framework is drawn from parsing and applied to translation, it is applicable to many dynamic programming problems arising in natural language processing and other areas.This draft corrects errors that appeared in the goal item of logic Monotone-Generate (Section 5; in particular, the goal item should have no words left to generate); and the deductive rules of Monotone-Generate + Ngram (Figure 2.2; the indexes of the n-gram context were incorrect, and the consequent of the second rule should start with i rather than i+1).
Thanks to Shay Cohen for pointing these out.
-
A Systematic Analysis of Translation Model Search Spaces.
Michael Auli, Adam Lopez, Hieu Hoang, and Philipp Koehn.
In
Proceedings of the Fourth Workshop on Statistical Machine Translation,
March 2009.
[abstract]
Translation systems are complex, and most metrics do little to pinpoint causes of error or isolate system differences. We use a simple technique to discover induction errors, which occur when good translations are absent from model search spaces. Our results show that a common pruning heuristic drastically increases induction error, and also strongly suggest that the search spaces of phrase-based and hierarchical phrase-based models are highly overlapping despite the well known structural differences.
-
Tera-Scale Translation Models via Pattern Matching.
In
Proceedings of COLING,
pages 505–512,
August 2008.
[abstract]
[slides]
Translation model size is growing at a pace that outstrips improvements in computing power, and this hinders research on many interesting models. We show how an algorithmic scaling technique can be used to easily handle very large models. Using this technique, we explore several large model variants and show an improvement 1.4 BLEU on the NIST 2006 Chinese-English task. This opens the door for work on a variety of models that are much less constrained by computational limitations.
-
Statistical Machine Translation.
In
ACM Computing Surveys
40(3),
Article 8,
pages 1–49,
August 2008.
[abstract]
[errata]
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and new ideas are constantly introduced. This survey presents a tutorial overview of the state-of-the-art. We describe the context of the current research and then move to a formal problem description and an overview of the main subproblems: translation modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and a discussion of future directions.The reference for Banerjee and Lavie (2005) on p. 39 is missing. It should be:
- S. Banerjee and A. Lavie. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL 2005 Workshop on Intrinsic and Extrinsic Evaulation Measures for MT and/or Summarization, 2005.
-
Machine Translation by Pattern Matching.
Dissertation, University of Maryland.
March 2008.
[abstract]
[slides]
[LaTeX source]
The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amount of data we can exploit and the complexity of models we can use are available memory and CPU time, and current state of the art already pushes these limits. With data size and model complexity continually increasing, a scalable solution to this problem is central to future improvement.
Callison-Burch et al. (2005) and Zhang and Vogel (2005) proposed a solution that we call "translation by pattern matching", which we bring to fruition in this dissertation. The training data itself serves as a proxy to the model; rules and parameters are computed on demand. It achieves our desiderata of minimal offline computation and compact representation, but is dependent on fast pattern matching algorithms on text. They demonstrated its application to a common model based on the translation of contiguous substrings, but leave some open problems. Among these is a question: can this approach match the performance of conventional methods despite unavoidable differences that it induces in the model? We show how to answer this question affirmatively.
The main open problem we address is much harder. Many translation models are based on the translation of discontiguous substrings. The best pattern matching algorithm for these models is much too slow, taking several minutes per sentence. We develop new algorithms that reduce empirical computation time by two orders of magnitude for these models, making translation by pattern matching widely applicable. We use these algorithms to build a model that is two orders of magnitude larger than the current state of the art and substantially outperforms a strong competitor in Chinese-English translation. We show that a conventional representation of this model would be impractical. Our experiments shed light on some interesting properties of the underlying model. The dissertation also includes the most comprehensive contemporary survey of statistical machine translation.
-
Hierarchical Phrase-Based Translation with Suffix Arrays.
In
Proceedings of EMNLP-CoNLL,
pages 976–985,
June 2007.
[abstract]
[slides]
A major engineering challenge in statistical machine translation systems is the efficient representation of extremely large translation rulesets. In phrase-based models, this problem can be addressed by storing the training data in memory and using a suffix array as an efficient index to quickly lookup and extract rules on the fly. Hierarchical phrase-based translation introduces the added wrinkle of source phrases with gaps. Lookup algorithms used for contiguous phrases no longer apply and the best approximate pattern matching algorithms are much too slow, taking several minutes per sentence. We describe new lookup algorithms for hierarchical phrase-based translation that reduce the empirical computation time by nearly two orders of magnitude, making on-the-fly lookup feasible for source phrases with gaps.
-
Word-Based Alignment, Phrase-Based Translation: What's the Link?
With Philip Resnik.
In
Proceedings of AMTA,
pages 90–99,
August 2006.
[abstract]
[slides]
State-of-the-art statistical machine translation is based on alignments between phrases—sequences of words in the source and target sentences. The learning step in these systems often relies on alignments between words. It is often assumed that the quality of this word alignment is critical for translation. However, recent results suggest that the relationship between alignment quality and translation quality is weaker than previously thought. We investigate this question directly, comparing the impact of high-quality alignments with a carefully constructed set of degraded alignments. In order to tease apart various interactions, we report experiments investigating the impact of alignments on different aspects of the system. Our results confirm a weak correlation, but they also illustrate that more data and better feature engineering may be more beneficial than better alignment.
-
Pattern Visualization for Machine Translation Output.
With Philip Resnik.
In
Proceedings of HLT/EMNLP Demonstrations,
pages 12–13,
October 2005.
[abstract]
[slides]
We describe a method for identifying systematic patterns in translation data using part-of-speech tag sequences. We incorporate this analysis into a diagnostic tool intended for developers of machine translation systems, and demonstrate how our application can be used by developers to explore patterns in machine translation output.
-
The Hiero Machine Translation System: Extensions, Evaluation, and Analysis.
David Chiang, Adam Lopez, Nitin Madnani, Christof Monz, Philip Resnik, and Michael Subotin.
In
Proceedings of HLT/EMNLP,
pages 779–786,
October 2005.
[abstract]
[slides]
Hierarchical organization is a well known property of language, and yet the notion of hierarchical structure has been largely absent from the best performing machine translation systems in recent community-wide evaluations. In this paper, we discuss a new hierarchical phrase-based statistical machine translation system (Chiang, 2005), presenting recent extensions to the original proposal, new evaluation results in a community-wide evaluation, and a novel technique for fine-grained comparative analysis of MT systems.
-
Improved HMM Alignment Models for Languages with Scarce Resources.
With Philip Resnik.
In
Proceedings of the ACL 2005 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond,
pages 83–86,
June 2005.
[abstract]
[slides]
[code]
We introduce improvements to statistical word alignment based on the Hidden Markov Model. One improvement incorporates syntactic knowledge. Results on the workshop data show that alignment performance exceeds that of a state-of-the art system based on more complex models, resulting in over a 5.5% absolute reduction in error on Romanian-English.
-
Word-Level Alignment for Multilingual Resource Acquisition.
With Michael Nossal, Rebecca Hwa, and Philip Resnik.
In
Proceedings of the LREC Workshop on Linguistic Knowledge Acquisition and Representation—Bootstrapping Annotated Language Data,
pages 34–42,
June 2002.
[abstract]
[slides]
We present a simple, one-pass word alignment algorithm for parallel text. Our algorithm utilizes synchronous parsing and takes advantage of existing syntactic annotations. In our experiments the performance of this model is comparable to more complicated iterative methods. We discuss the challenges and potential beneļ¬ts of using this model to train syntactic parsers for new languages.
Talks and Tutorials
Over the last few years I have come around to the view that slides are visual aids for talks, so I make no representation that they stand on their own without an accompanying narration (especially the more recent ones.) Even so, I occasionally get requests for them, so I've put them here and you're free to do what you like with them. I'd appreciate an acknowledgement if you use them in your work.- Semiring Parsing without Parsing. Talk at Cambridge and Oxford Universities, November 2009.
- Translation Model Search Spaces. Talk at Saarland University, July 2009. Also given at at Dublin City University.
- Machine Translation: Models, Search, and Evaluation. Tutorial at the JHU Summer School on Human Language Technology, June 2009. I've given various versions of this talk over several years. It's aimed at people with no prior knowledge of statistical translation, but hopefully fun for everyone. I'd be happy to present for your class.
- Syntax-based Machine Translation. Tutorial at the Second Machine Translation Marathon, May 2008.
- Translation by Pattern Matching. Talk at the Second Machine Translation Marathon, May 2008. I've given various versions of this talk at Amsterdam, Carnegie Mellon, Edinburgh, Microsoft Research, MITRE, and Pittsburgh.
- Inside the Hiero Decoder. Tutorial given at the University of Maryland, September 2006.