Abhishek Arun
- Mailing address
- Institute of Communicative and Collaborative Systems
School of Informatics
University of Edinburgh
10 Crichton Street
Edinburgh EH8 9AB
- Phone
- (O) +44-131-650-4419
- E-mail
- (first initial.last name) @sms.ed.ac.uk
- Research Interests
- Statistical Machine Translation
- Statistical Parsing
- Machine Learning
I'm a third year PhD candidate in the School of Informatics at the University of Edinburgh, working on Statistical Machine Translation,
under the supervision of Philipp Koehn and Miles Osborne. I am interested in richer translation models and the task of parameter estimation for such models.
Before joining the StatMT group, I did some work with Frank Keller on
crosslinguistic probabilistic parsing. For the purpose of this research, I developed a French language package extension for
Dan Bikel's truly excellent
multilingual statistical parser. The French Treebank we used is the Corpus Le Monde developed by Anne Abeille et al. at the Universite de Paris VII.
A license can be obtained by emailing Dr Abeille.
Publications
Machine Translation
- Monte Carlo inference and maximization for phrase-based translation. Abhishek Arun, Chris Dyer, Barry Haddow, Phil Blunsom, Adam Lopez, and Philipp Koehn. Proceedings of CoNLL, June 2009. [PDF]
- Towards better Machine Translation Quality for the German to English Language Pairs. Philipp Koehn, Abhishek Arun and Hieu Hoang, 2008. Proc 3rd Workshop on SMT [PDF]
- A Distortion Model for Arabic to English maximum entropy word alignment. Abhishek Arun and Abraham Ittycheriah, 2008. IBM Technical Report RC24584 [PDF]
- Online Learning Methods For Discriminative Training of Phrase Based Statistical Machine Translation. Abhishek Arun and Philipp Koehn, 2007. Proc MT Summit XI [PDF]
- Edinburgh System Description for the 2006 TC-STAR Spoken Language Translation Evaluation.
Abhishek Arun, Amittai Axelrod, Alexandra Birch, Chris Callison-Burch, Hieu Hoang, Philipp Koehn, Miles Osborne, David Talbot. 2006.
Proc. of TC-STAR Workshop on Speech-to-Speech Translation. [PDF]
Statistical parsing
- Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French. Abhishek Arun and Frank Keller, 2005. Proc. ACL
[PDF] Slides from my ACL talk with updated results.
- Statistical Parsing of the French Treebank. Abhishek Arun, 2005. Master's thesis, Univ of Edinburgh [PDF]
Talks
- Discriminative Training for machine translation First MT Marathon, Univ of Edinburgh, 2007 [PDF]
Assorted Experience
- ACL 2009, EMNLP 2009 Reviewer for MT track
- Research Intern, Natural Language Technologies Group, IBM T J Watson Research Center, Yorktown, New York, Summer 2007
- Graduate Teaching Assistant, Introduction to Computational Linguistics, Spring 2007
- Co-organiser, Stats NLP Reading group, Univ of Edinburgh, 2006-2008
Some links