Photo of me

Kristian Woodsend

Contact:

Informatics Forum 4.27
10 Crichton Street
Edinburgh, EH8 9AB
United Kingdom

Email:

k.woodsend at ed.ac.uk

I am currently a Research Associate at ILCC, in the School of Informatics, University of Edinburgh.

Research

I am working with Dr Mirella Lapata on natural language summarization using integer linear programming (ILP). Our aim is to develop novel models for text generation, which means tasks such as producing summaries, highlights, captions or simplifications of existing articles.

We have a particular focus on using ILP as the decision-making mechanism — ILPs are really quite efficient at exploring the whole solution space and finding the global optimum. The challenge is to incorporate knowledge about natural language concisely into the models. Here, we are using a mixture of machine learning techniques and probabilistic grammar structures to generate possibilities, and the ILP to make the optimal choice.

My PhD was in large-scale numerical optimization, under the supervision of Prof Jacek Gondzio at ERGO. I researched methods for training support vector machines (SVM) using the interior point method of continuous optimization. You can download the software — it is free for academic use. It is particularly efficient on multicore parallel computing platforms, and it did pretty well in the Pascal Large Scale Learning Challenge (2008).

Online demos

I have put some demonstrations of our NLP research on this website:
  1. Multiple aspect approach to summarization
  2. Sentence simplification, learning from Simple Wikipedia
Let me know how they work for you!

Data sets

Here are data sets and other material we have used in papers:
  1. CNN highlights dataset used in Woodsend and Lapata (2010, ACL); contains alignments of CNN highlights with document sentences.
  2. Simple English Wikipedia revisions dataset used in Woodsend and Lapata (2011, EMNLP); this contains diff-ed revisions of Simple English Wikipedia that were marked by the editors as simplifications.

Selected papers

  1. Kristian Woodsend and Mirella Lapata. 2012 . Multiple Aspect Summarization Using Integer Linear Programming . EMNLP 2012, Jeju, Korea.

  2. Kristian Woodsend and Mirella Lapata. 2011 . Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming . EMNLP 2011, Edinburgh, UK.

  3. Kristian Woodsend and Mirella Lapata. 2011 . WikiSimple: Automatic Simplification of Wikipedia Articles . AAAI 2011, San Francisco, USA.

  4. Kristian Woodsend and Mirella Lapata. 2010 . Automatic Generation of Story Highlights. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 565–574. Uppsala, Sweden.

  5. Kristian Woodsend, Yansong Feng and Mirella Lapata. 2010 . Title Generation with Quasi-Synchronous Grammar. To appear in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 513–523. Cambridge, MA.

  6. Kristian Woodsend. 2009 . Using Interior Point Methods for Large-scale Support Vector Machine training. PhD thesis, University of Edinburgh.

  7. Kristian Woodsend and Jacek Gondzio. 2009 . Hybrid MPI/OpenMP parallel linear support vector machine training. Journal of Machine Learning Research, 10:1937–1953.

  8. Kristian Woodsend and Jacek Gondzio. 2009 . Exploiting separability in large-scale linear support vector machine training. To appear in Computational Optimization and Applications.

  9. Marco Colombo, Andreas Grothey, Jonathan Hogg, Kristian Woodsend, and Jacek Gondzio 2009 . A structure-conveying modelling language for mathematical and stochastic programming. Mathematical Programming Computation, 1(4):223–247.

  10. Kristian Woodsend and Jacek Gondzio. 2009 . High-performance parallel support vector machine training. In R. Ciegis, D. Henty, B. Kagstrom, and J. Zilinskas, editors, Parallel Scientific Computing and Optimization: Advances and Applications, volume 27 of Springer Optimization and Its Applications, pages 83–92. Springer-Verlag, Berlin.

  11. Andreas Grothey, Jonathan Hogg, Kristian Woodsend, Marco Colombo, and Jacek Gondzio. 2009 . A structure-conveying parallelisable modelling language for mathematical programming. In R. Ciegis, D. Henty, B. Kagstrom, and J. Zilinskas, editors, Parallel Scientific Computing and Optimization: Advances and Applications, volume 27 of Springer Optimization and Its Applications, pages 147–158. Springer-Verlag, Berlin.