Currently, I'm most interested in (statistical) machine translation (SMT), Social Media and doing things with massive streams of language (eg randomised algorithms, modelling the stock market, real-time search etc).
Did you know that using Blog posts to predict stock prices, Google is a better predictor of Yahoo! stock price than Yahoo itself? Or, Tweets are more useful for modelling peoples' collective belief of an imminent Swine Flu' epidemic than Blogs. Or that on Twitter, people really like to talk about deaths?
Related to these broad areas is the question of how to train and apply large models. For example, our machine translation systems need to run on a cluster of machines. Throwing more machines at such models is great, but it is clear that the most interesting models and datasets will make computational demands which far outstrip whatever resources we have available. Imagine training with all data that appears on the Web, each and every day. Scaling our machine learning methods will become crucial. Randomised and streaming algorithms will prove essential here and our work using Bloom Filters for Language Models is a start in this direction, as it our work on locality sensitive hashing. Infrastructure to support large-scale experiments is vital. I have been installing and playing around with Hadoop for quite a while now as a fun way to do this.
My list of papers is divided by topic and these roughly correlate with my research. I'm always happy to take on one or two new PhD students each year.