<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 
  <title>Adam Lopez</title>
  <link href="http://alopez.github.com/atom.xml" rel="self"/>
  <link href="http://alopez.github.com"/>
  <updated>2009-11-01T13:39:48+00:00</updated>
  <id>http://alopez.github.com/</id>
  <author>
    <name>Adam Lopez</name>
    <email>alopez@inf.ed.ac.uk</email>
  </author>
 
  
  
  <entry>
    <title>Paper: A Unified Framework for Phrase-Based, Hierarchical, and Syntax-Based Statistical Machine Translation</title>
    <link href="/2009/12/01/iwslt-unified.html"/>
    <updated>2009-12-01T00:00:00+00:00</updated>
    <id>http://http://alopez.github.com/2009/12/01/iwslt-unified</id>
    <content type="html">Despite many differences between phrase-based, hierarchical, and syntax-based translation models, their training and testing pipelines are strikingly similar.  Drawing on this fact, we extend the Moses toolkit to implement hierarchical and syntactic models, making it the first open source toolkit with end-to-end support for all three of these popular models in a single package.  This extension substantially lowers the barrier to entry for machine translation research across multiple models.</content>
 </entry>
 
 
  
  <entry>
    <title>Talk: Translation Model Search Spaces</title>
    <link href="/2009/07/09/translation-model-search-spaces.html"/>
    <updated>2009-07-09T00:00:00+01:00</updated>
    <id>http://http://alopez.github.com/2009/07/09/translation-model-search-spaces</id>
    <content type="html"></content>
 </entry>
 
 
  
  <entry>
    <title>Talk: Machine Translation&#58; Models, Search, and Evaluation</title>
    <link href="/2009/06/11/clsp-intro-to-machine-translation.html"/>
    <updated>2009-06-11T00:00:00+01:00</updated>
    <id>http://http://alopez.github.com/2009/06/11/clsp-intro-to-machine-translation</id>
    <content type="html"></content>
 </entry>
 
 
  
  <entry>
    <title>Paper: Monte Carlo inference and maximization for phrase-based translation</title>
    <link href="/2009/06/01/conll-monte-carlo-inference-and-maximization.html"/>
    <updated>2009-06-01T00:00:00+01:00</updated>
    <id>http://http://alopez.github.com/2009/06/01/conll-monte-carlo-inference-and-maximization</id>
    <content type="html">Recent advances in statistical machine translation have used beam search for  approximate NP-complete inference within probabilistic translation models. We present an alternative approach of sampling from the posterior distribution  defined by a translation model.  We define a novel Gibbs sampler for sampling  translations given a source sentence and show that it effectively explores this  posterior distribution.  In doing so we overcome the limitations of heuristic  beam search and obtain theoretically sound solutions to inference problems such  as finding the maximum probability translation and minimum expected risk training  and decoding.</content>
 </entry>
 
 
  
  <entry>
    <title>Paper: Translation as Weighted Deduction</title>
    <link href="/2009/03/30/eacl-2009-translation-as-weighted-deduction.html"/>
    <updated>2009-03-30T00:00:00+01:00</updated>
    <id>http://http://alopez.github.com/2009/03/30/eacl-2009-translation-as-weighted-deduction</id>
    <content type="html">We present a unified view of many translation algorithms that synthesizes  work on deductive parsing, semiring parsing, and efficient approximate search  algorithms.  This gives rise to clean analyses and compact descriptions that  can serve as the basis for modular implementations.  We illustrate this with  several examples, showing how to mechanically develop search spaces using  non-local features, novel models, and a variety of disparate phrase-based  strategies.  Although the framework is drawn from parsing and applied to  translation, it is applicable to many dynamic programming problems arising  in natural language processing and other areas.
</content>
 </entry>
 
 
  
  <entry>
    <title>Paper: A Systematic Analysis of Translation Model Search Spaces</title>
    <link href="/2009/03/28/wmt-2009-translation-model-search-spaces.html"/>
    <updated>2009-03-28T00:00:00+00:00</updated>
    <id>http://http://alopez.github.com/2009/03/28/wmt-2009-translation-model-search-spaces</id>
    <content type="html">Translation systems are complex, and most metrics do little to pinpoint causes of error or isolate system differences.  We use a simple technique to discover induction errors, which occur when good translations are absent from model search spaces.  Our results show that a common pruning heuristic drastically increases induction error, and also strongly suggest that the search spaces of phrase-based and hierarchical phrase-based models are highly overlapping despite the well known structural differences.</content>
 </entry>
 
 
  
  <entry>
    <title>Talk: Introduction to Statistical Machine Translation</title>
    <link href="/2009/01/26/talk-introduction-to-statistical-mt.html"/>
    <updated>2009-01-26T00:00:00+00:00</updated>
    <id>http://http://alopez.github.com/2009/01/26/talk-introduction-to-statistical-mt</id>
    <content type="html"></content>
 </entry>
 
 
  
  <entry>
    <title>Paper: Tera-Scale Translation Models via Pattern Matching</title>
    <link href="/2008/08/18/coling-tera-scale-translation-models-via-pattern-matching.html"/>
    <updated>2008-08-18T00:00:00+01:00</updated>
    <id>http://http://alopez.github.com/2008/08/18/coling-tera-scale-translation-models-via-pattern-matching</id>
    <content type="html">Translation model size is growing at a pace that outstrips improvements in  computing power, and this hinders research on many interesting models.  We  show how an algorithmic scaling technique can be used to easily handle very  large models.  Using this technique, we explore several large model variants  and show an improvement 1.4 BLEU on the NIST 2006 Chinese-English task.  This  opens the door for work on a variety of models that are much less constrained  by computational limitations.</content>
 </entry>
 
 
  
  <entry>
    <title>Paper: Statistical Machine Translation</title>
    <link href="/2008/08/01/acm-computing-surveys-statistical-machine-translation.html"/>
    <updated>2008-08-01T00:00:00+01:00</updated>
    <id>http://http://alopez.github.com/2008/08/01/acm-computing-surveys-statistical-machine-translation</id>
    <content type="html">Statistical machine translation (SMT) treats the  translation of natural language as a machine learning problem.  By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and new ideas are constantly introduced. This survey presents a tutorial overview of the state-of-the-art.  We describe the context of the current research and then move to a formal problem description and an overview of the main subproblems: translation modeling, parameter estimation, and decoding.  Along the way, we present a taxonomy of some different approaches within these areas.  We conclude with an overview of evaluation and a discussion of future directions.</content>
 </entry>
 
 
  
  <entry>
    <title>Talk: Translation by Pattern Matching</title>
    <link href="/2008/05/12/mtm-translation-by-pattern-matching.html"/>
    <updated>2008-05-12T00:00:00+01:00</updated>
    <id>http://http://alopez.github.com/2008/05/12/mtm-translation-by-pattern-matching</id>
    <content type="html"></content>
 </entry>
 
 
  
  <entry>
    <title>Talk: Syntax-based Machine Translation</title>
    <link href="/2008/05/12/mtm-syntax-based-machine-translation.html"/>
    <updated>2008-05-12T00:00:00+01:00</updated>
    <id>http://http://alopez.github.com/2008/05/12/mtm-syntax-based-machine-translation</id>
    <content type="html"></content>
 </entry>
 
 
  
  <entry>
    <title>Paper: Machine Translation by Pattern Matching</title>
    <link href="/2008/03/25/dissertation-machine-translation-by-pattern-matching.html"/>
    <updated>2008-03-25T00:00:00+00:00</updated>
    <id>http://http://alopez.github.com/2008/03/25/dissertation-machine-translation-by-pattern-matching</id>
    <content type="html"><p>The best systems for machine translation of natural language are based on statistical models learned from data.  Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory.  Therefore, the principal bottlenecks to the amount of data we can exploit and the complexity of models we can use are available memory and CPU time, and current state of the art already pushes these limits.  With data size and model complexity continually increasing, a scalable solution to this problem is central to future improvement.</p>
<p>Callison-Burch et al. (2005) and Zhang and Vogel (2005) proposed a solution that we call "translation by pattern matching", which we bring to fruition in this dissertation.  The training data itself serves as a proxy to the model; rules and parameters are computed on demand.  It achieves our desiderata of minimal offline computation and compact representation, but is dependent on fast pattern matching algorithms on text.  They demonstrated its application to a common model based on the translation of contiguous substrings, but leave some open problems.  Among these is a question: can this approach match the performance of conventional methods despite unavoidable differences that it induces in the model?  We show how to answer this question affirmatively.</p>
<p>The main open problem we address is much harder.  Many translation models are based on the translation of discontiguous substrings.  The best pattern matching algorithm for these models is much too slow, taking several minutes per sentence.  We develop new algorithms that reduce empirical computation time by two orders of magnitude for these models, making translation by pattern matching widely applicable.  We use these algorithms to build a model that is two orders of magnitude larger than the current state of the art and substantially outperforms a strong competitor in Chinese-English translation.  We show that a conventional representation of this model would be impractical.  Our experiments shed light on some interesting properties of the underlying model.  The dissertation also includes the most comprehensive contemporary survey of statistical machine translation.</p>
</content>
 </entry>
 
 
  
  <entry>
    <title>Talk: Hierarchical Phrase-based Translation with Suffix Arrays</title>
    <link href="/2007/07/03/edinburgh-hierarchical-phrase-based-translation-with-suffix-arrays.html"/>
    <updated>2007-07-03T00:00:00+01:00</updated>
    <id>http://http://alopez.github.com/2007/07/03/edinburgh-hierarchical-phrase-based-translation-with-suffix-arrays</id>
    <content type="html"></content>
 </entry>
 
 
  
  <entry>
    <title>Paper: Hierarchical Phrase-Based Translation with Suffix Arrays</title>
    <link href="/2007/06/30/emnlp-conll-hierarchical-phrase-based-translation-with-suffix-arrays.html"/>
    <updated>2007-06-30T00:00:00+01:00</updated>
    <id>http://http://alopez.github.com/2007/06/30/emnlp-conll-hierarchical-phrase-based-translation-with-suffix-arrays</id>
    <content type="html">A major engineering challenge in statistical machine translation systems is the efficient representation of extremely large translation rulesets. In phrase-based models, this problem  can be addressed by storing the training data in memory and using a suffix array as an efficient index to quickly lookup and extract rules on the fly. <i>Hierarchical</i> phrase-based translation introduces the added wrinkle of source phrases with gaps.   Lookup algorithms used for contiguous phrases no longer apply and the best approximate pattern matching algorithms are much too slow, taking several minutes per sentence. We describe new lookup algorithms  for hierarchical phrase-based translation that reduce the empirical computation time by nearly two orders of magnitude, making on-the-fly lookup feasible for source phrases with gaps.
</content>
 </entry>
 
 
  
  <entry>
    <title>Talk: How Does Machine Translation Work?</title>
    <link href="/2007/04/05/lecture-how-does-machine-translation-work.html"/>
    <updated>2007-04-05T00:00:00+01:00</updated>
    <id>http://http://alopez.github.com/2007/04/05/lecture-how-does-machine-translation-work</id>
    <content type="html"></content>
 </entry>
 
 
  
  <entry>
    <title>Talk: Translation by the Numbers</title>
    <link href="/2006/11/07/talk-translation-by-the-numbers.html"/>
    <updated>2006-11-07T00:00:00+00:00</updated>
    <id>http://http://alopez.github.com/2006/11/07/talk-translation-by-the-numbers</id>
    <content type="html"></content>
 </entry>
 
 
  
  <entry>
    <title>Talk: Inside the Hiero Decoder</title>
    <link href="/2006/09/18/tutorial-inside-the-hiero-decoder.html"/>
    <updated>2006-09-18T00:00:00+01:00</updated>
    <id>http://http://alopez.github.com/2006/09/18/tutorial-inside-the-hiero-decoder</id>
    <content type="html"></content>
 </entry>
 
 
  
  <entry>
    <title>Paper: Word-Based Alignment, Phrase-Based Translation&#58; What's the Link?</title>
    <link href="/2006/08/08/amta-word-based-alignment-phrase-based-translation-whats-the-link.html"/>
    <updated>2006-08-08T00:00:00+01:00</updated>
    <id>http://http://alopez.github.com/2006/08/08/amta-word-based-alignment-phrase-based-translation-whats-the-link</id>
    <content type="html">State-of-the-art statistical machine translation is  based on alignments between <i>phrases</i>&mdash;sequences of words  in the source and target sentences.  The learning step in these  systems often relies on alignments between <i>words</i>.   It is often assumed that the quality of this word alignment is  critical for translation. However, recent results suggest that the relationship between alignment quality and translation quality is weaker than previously thought.  We investigate this  question directly, comparing the impact of high-quality  alignments with a carefully constructed set of degraded  alignments.  In order to tease apart various interactions,  we report experiments investigating the impact of alignments  on different aspects of the system.  Our results confirm a weak  correlation, but they also illustrate that more data and better  feature engineering may be more beneficial than better alignment.</content>
 </entry>
 
 
  
  <entry>
    <title>Paper: The Hiero Machine Translation System&#58; Extensions, Evaluation, and Analysis</title>
    <link href="/2005/10/06/hlt-emnlp-the-hiero-machine-translation-system-extensions-evaluation-and-analysis.html"/>
    <updated>2005-10-06T00:00:00+01:00</updated>
    <id>http://http://alopez.github.com/2005/10/06/hlt-emnlp-the-hiero-machine-translation-system-extensions-evaluation-and-analysis</id>
    <content type="html">Hierarchical organization is a well known property of language, and yet the notion of hierarchical structure has been largely absent from the best performing machine translation systems in recent community-wide evaluations.  In this paper, we discuss a new hierarchical phrase-based statistical machine translation system (Chiang, 2005), presenting recent extensions to the original proposal, new evaluation results in a community-wide evaluation, and a novel technique for fine-grained comparative analysis of MT systems.</content>
 </entry>
 
 
  
  <entry>
    <title>Paper: Pattern Visualization for Machine Translation Output</title>
    <link href="/2005/10/06/hlt-emnlp-pattern-visualization-for-machine-translation-output.html"/>
    <updated>2005-10-06T00:00:00+01:00</updated>
    <id>http://http://alopez.github.com/2005/10/06/hlt-emnlp-pattern-visualization-for-machine-translation-output</id>
    <content type="html">We describe a method for identifying systematic patterns in translation  data using part-of-speech tag sequences. We incorporate this analysis  into a diagnostic tool intended for developers of machine translation  systems, and demonstrate how our application can be used by developers to  explore patterns in machine translation output. </content>
 </entry>
 
 
  
  <entry>
    <title>Talk: Statistical Machine Translation</title>
    <link href="/2005/09/10/talk-statistical-machine-translation.html"/>
    <updated>2005-09-10T00:00:00+01:00</updated>
    <id>http://http://alopez.github.com/2005/09/10/talk-statistical-machine-translation</id>
    <content type="html"></content>
 </entry>
 
 
  
  <entry>
    <title>Paper: Improved HMM Alignment Models for Languages with Scarce Resources</title>
    <link href="/2005/06/29/improved-hmm-alignment-models-for-languages-with-scarce-resources.html"/>
    <updated>2005-06-29T00:00:00+01:00</updated>
    <id>http://http://alopez.github.com/2005/06/29/improved-hmm-alignment-models-for-languages-with-scarce-resources</id>
    <content type="html">We introduce improvements to statistical word  alignment based on the Hidden Markov  Model. One improvement incorporates syntactic  knowledge. Results on the workshop data  show that alignment performance exceeds that  of a state-of-the art system based on more complex  models, resulting in over a 5.5% absolute  reduction in error on Romanian-English.</content>
 </entry>
 
 
  
  <entry>
    <title>Paper: Word-Level Alignment for Multilingual Resource Acquisition</title>
    <link href="/2002/06/01/word-level-alignment-for-multilingual-resource-acquisition.html"/>
    <updated>2002-06-01T00:00:00+01:00</updated>
    <id>http://http://alopez.github.com/2002/06/01/word-level-alignment-for-multilingual-resource-acquisition</id>
    <content type="html">We present a simple, one-pass word alignment algorithm for parallel text. Our algorithm utilizes synchronous parsing and takes advantage  of existing syntactic annotations. In our experiments the performance of this model is comparable to more complicated iterative methods.  We discuss the challenges and potential beneﬁts of using this model to train syntactic parsers for new languages.</content>
 </entry>
 
 
 
</feed>
