Potential PhD Projects

This is a list of projects for which I would be interested in supervising PhD students. For general advice about how to apply and what background I expect, please look at my general page for prospective PhD students.

Statistical Machine Learning and Natural Language Processing of Programming Language Text

We are seeking to award a Microsoft PhD Scholarship on the topic of "Statistical Machine Learning and Natural Language Processing of Programming Language Text." This is a fully funded three year PhD scholarship. This project will be supervised by Dr Charles Sutton of the School of Informatics at the University of Edinburgh.

The goal of this project is to apply the advanced statistical techniques from natural language processing to a completely different and new textual domain: programming language text. Think about how you program when you are using a new library or new environment for the first time. You "program by search engine", i.e., you search for examples of people who have used the same library, and you copy chunks of code from them. The goal of this project is to systemize this process, and apply it at a large scale.

We have collected a corpus of 1.5 billion lines of source code from 8000 software projects, and we want to find syntactic patterns that recur across projects. These can then be presented to a programmer as she is writing code, providing an autocomplete functionality that can suggest entire function bodies. Statistical techniques involved include language modeling, data mining, and Bayesian nonparametrics. This also raises some deep and interesting questions in software engineering: i.e., Why do syntactic patterns occur in professionally written software when they could be refactored away?

The project is suitable for a student with a top MSc or first-class bachelor's degree in computer science, statistics, physics, or a related numerate discipline. Previous coursework or experience in statistics, machine learning, or statistical natural language processing is desirable, although we do not expect students to have all three of these. Because of the scale of the data set involved, a strong programming background will be very useful for this project.

This is an opportunity to join a world-leading research group in machine learning. The Research Programme in Machine Learning is hosted by the Institute for Adaptive and Neural Computation (ANC), a research group of the School of Informatics, University of Edinburgh. According to the 2008 Research Assessment Exercise (RAE), the School of Informatics, University of Edinburgh delivers more world leading (4*) research than all other RAE institutions in the computer science category, and also delivers more internationally excellent or world leading (3* and 4*) research. ANC is a world leader in Machine Learning, with 6 Academic Teaching Staff specialising in developing machine learning methods (Chris Bishop, Chris Williams, Amos Storkey, Charles Sutton, Guido Sanguinetti and Iain Murray).

The Microsoft scholarship consists of an annual bursary up to a maximum of three years. During the course of their PhD, the Scholar will be invited to Microsoft Research in Cambridge for an annual PhD Summer School that includes a series of talks of academic interest and poster sessions, which provides an opportunity to present their work to Microsoft researchers and a number of Cambridge academics.

For full consideration, please apply by January 13. However, we encourage students to apply before 16 December 2011, which is the main application deadline for the School of Informatics. All applications that arrive by January 13 will receive full consideration for this studentship, but students who apply before 16 Dec will also receive full consideration for other potential funding sources in the School of Informatics. This is especially important for overseas applicants.

This is a fully funded studentship for UK and EU students. We welcome overseas applicants, and can provide funding for EU fees and maintenance for overseas students. The remaining fees component will need to come from another source. Overseas applicants are advised to apply before the standard informatics deadlines and apply for other scholarships. See http://www.ed.ac.uk/schools-departments/informatics/postgraduate/fees and http://www.ed.ac.uk/schools-departments/informatics/postgraduate/apply/keydatesresearchappns for further information.

Learning the Structures of Models of Computer System Performance

Modern computer systems have become more complex than ever before, with distributed systems becoming a mainstream computing tool. Low latency is a crucial design goal for these systems, because users will not adopt an interactive Web service that is slow. Understanding the performance of a distributed system is extremely difficult because of the many intercations between components.

In this project, we will address this problem by attempting to learn the structure of models to describe the performance of these systems. The goal of this projects is to automatically determine the structure of models to describe the performance of warehouse-scale and cloud applications. Possible structure may include networks of nonparametric regression models, networks of queues, or more complex performance models such as stochastic process algebras. The idea is that the learning structure will be useful for visualization, i.e., that it will provide a compact, interpretable description of the system's performance, so that performance bugs in the system will be visually apparent as bottlenecks in the learned queueing network. Essentially, the learned model will serve as a summary of the large amount of performance data used to generate it. Structure learning is a notoriously complex problem in machine learning, so this new application may serve as a challenge problem for this area.

We advise students to apply before the 16 December 2011 deadline.

There is not currently any dedicated funding for this topic. However, the School of Informatics offers a variety of scholarships, please see http://www.ed.ac.uk/schools-departments/informatics/postgraduate/fees.

Back to my home page.