Abhishek Arun
- Mailing address
- Institute of Communicative and Collaborative Systems
School of Informatics
University of Edinburgh
10 Crichton Street
Edinburgh EH8 9AB - Phone
- (O) +44-131-650-4415
- ab dot arun at gmail dot com
- Research Interests
- Statistical Machine Translation
- Statistical Parsing
- Machine Learning
- Markov chain Monte Carlo
I'm a PhD candidate in the School of Informatics at the University of Edinburgh, working on Statistical Machine Translation, under the supervision of Philipp Koehn and Miles Osborne. I am interested in rich translation models and the task of search and parameter estimation in such models. Since the probability distributions underlying these models are very complex, most current systems resort to approximations for performing tractable inference. These approximations, though they tend to work well in practice, are often theoretically unsatisfactory. In my thesis, I propose the use of Markov chain Monte Carlo techniques for theoretically sound approximations within these models and show that translations of a better quality can be obtained as a result. The code I have implemented during my research is available as part of our popular in-house open source decoder Moses.
Before joining the StatMT group, I did some work with Frank Keller on crosslinguistic probabilistic parsing. For the purpose of this research, I developed a French language package extension for Dan Bikel's truly excellent multilingual statistical parser. The French Treebank we used is the Corpus Le Monde developed by Anne Abeille et al. at the Universite de Paris VII. A license can be obtained by emailing Dr Abeille.
News: I successfully defended my Ph.D thesis in 2010 and have since joined Microsoft's Search Technology Centre in London to work on Bing.
Publications
Machine Translation- SampleRank training for phrase-based translation. Barry Haddow, Abhishek Arun, and Philipp Koehn.6th Workshop on Statistical Machine Translation, 2011. [PDF]
- A Unified Approach to Minimum Risk Training and Decoding. Abhishek Arun, Barry Haddow and Philipp Koehn.5th Workshop on Statistical Machine Translation, 2010. [PDF]
- Monte Carlo techniques for phrase-based translation. Abhishek Arun, Chris Dyer, Barry Haddow, Phil Blunsom, Adam Lopez, and Philipp Koehn. Special Issue of the Machine Translation journal, 2010.
- Monte Carlo inference and maximization for phrase-based translation. Abhishek Arun, Chris Dyer, Barry Haddow, Phil Blunsom, Adam Lopez, and Philipp Koehn. Proceedings of CoNLL, June 2009. [PDF]
- Towards better Machine Translation Quality for the German to English Language Pairs. Philipp Koehn, Abhishek Arun and Hieu Hoang, 2008. Proc 3rd Workshop on SMT [PDF]
- A Distortion Model for Arabic to English maximum entropy word alignment. Abhishek Arun and Abraham Ittycheriah, 2008. IBM Technical Report RC24584 [PDF]
- Online Learning Methods For Discriminative Training of Phrase Based Statistical Machine Translation. Abhishek Arun and Philipp Koehn, 2007. Proc MT Summit XI [PDF]
- Edinburgh System Description for the 2006 TC-STAR Spoken Language Translation Evaluation. Abhishek Arun, Amittai Axelrod, Alexandra Birch, Chris Callison-Burch, Hieu Hoang, Philipp Koehn, Miles Osborne, David Talbot. 2006. Proc. of TC-STAR Workshop on Speech-to-Speech Translation. [PDF]
- Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French. Abhishek Arun and Frank Keller, 2005. Proc. ACL [PDF] Slides from my ACL talk with updated results.
- Statistical Parsing of the French Treebank. Abhishek Arun, 2005. Master's thesis, Univ of Edinburgh [PDF]
Talks
- Discriminative Training for machine translation First MT Marathon, Univ of Edinburgh, 2007 [PDF]
Assorted Experience
- ACL 2009, EMNLP 2009 Reviewer for MT track
- Research Intern, Natural Language Technologies Group, IBM T J Watson Research Center, Yorktown, New York, Summer 2007
- Graduate Teaching Assistant, Introduction to Computational Linguistics, Spring 2007
- Co-organiser, Stats NLP Reading group, Univ of Edinburgh, 2006-2008
- Co-admin of Moses - Open Source phrase based SMT Decoder