Abhishek Arun
- Mailing address
- Institute of Communicative and Collaborative Systems
School of Informatics
University of Edinburgh
10 Crichton Street
Edinburgh EH8 9AB - Phone
- (O) +44-131-650-4415
- (first initial.last name) @sms.ed.ac.uk
- Research Interests
- Statistical Machine Translation
- Statistical Parsing
- Machine Learning
- Markov chain Monte Carlo
I'm a PhD candidate in the School of Informatics at the University of Edinburgh, working on Statistical Machine Translation, under the supervision of Philipp Koehn and Miles Osborne. I am interested in rich translation models and the task of search and parameter estimation in such models. The probability distributions underlying these models are very complex, consequently most current systems resort to approximations for performing tractable inference. These approximations, though they tend to work well in practice, are often theoretically unsatisfactory. In my thesis, I propose the use of Markov chain Monte Carlo techniques for performing theoretically sound approximations within these complex translation models leading to translations of a better quality. The code I have implemented during my research is available as part of our popular in-house open source decoder Moses.
Before joining the StatMT group, I did some work with Frank Keller on crosslinguistic probabilistic parsing. For the purpose of this research, I developed a French language package extension for Dan Bikel's truly excellent multilingual statistical parser. The French Treebank we used is the Corpus Le Monde developed by Anne Abeille et al. at the Universite de Paris VII. A license can be obtained by emailing Dr Abeille.
Publications
Machine Translation- Monte Carlo inference and maximization for phrase-based translation. Abhishek Arun, Chris Dyer, Barry Haddow, Phil Blunsom, Adam Lopez, and Philipp Koehn. Proceedings of CoNLL, June 2009. [PDF]
- Towards better Machine Translation Quality for the German to English Language Pairs. Philipp Koehn, Abhishek Arun and Hieu Hoang, 2008. Proc 3rd Workshop on SMT [PDF]
- A Distortion Model for Arabic to English maximum entropy word alignment. Abhishek Arun and Abraham Ittycheriah, 2008. IBM Technical Report RC24584 [PDF]
- Online Learning Methods For Discriminative Training of Phrase Based Statistical Machine Translation. Abhishek Arun and Philipp Koehn, 2007. Proc MT Summit XI [PDF]
- Edinburgh System Description for the 2006 TC-STAR Spoken Language Translation Evaluation. Abhishek Arun, Amittai Axelrod, Alexandra Birch, Chris Callison-Burch, Hieu Hoang, Philipp Koehn, Miles Osborne, David Talbot. 2006. Proc. of TC-STAR Workshop on Speech-to-Speech Translation. [PDF]
- Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French. Abhishek Arun and Frank Keller, 2005. Proc. ACL [PDF] Slides from my ACL talk with updated results.
- Statistical Parsing of the French Treebank. Abhishek Arun, 2005. Master's thesis, Univ of Edinburgh [PDF]
Talks
- Discriminative Training for machine translation First MT Marathon, Univ of Edinburgh, 2007 [PDF]
Assorted Experience
- ACL 2009, EMNLP 2009 Reviewer for MT track
- Research Intern, Natural Language Technologies Group, IBM T J Watson Research Center, Yorktown, New York, Summer 2007
- Graduate Teaching Assistant, Introduction to Computational Linguistics, Spring 2007
- Co-organiser, Stats NLP Reading group, Univ of Edinburgh, 2006-2008
- Co-admin of Moses - Open Source phrase based SMT Decoder