Dr. Victor Lavrenko

   Lecturer (Assistant Professor)
   University of Edinburgh
   School of Informatics
   10 Crichton Street
   Edinburgh, EH8 9AB, UK

   tel: +44.131.651.5612
    ° in print, Springer, 2009
    ° Table of Contents
    ° Sample: Chapter 2
    ° Read online (SpringerLink)
    ° Preview (Google Books)
    ° Preview (Amazon)
    ° Review 1 (Robert Luk)
    ° Review 2 (Jianhan Zhu)
I'm a Lecturer in Informatics at the University of Edinburgh (that's British speak for assistant professor of computer science). I work on developing better algorithms for search engines, with a particular focus on interaction, multimedia and scalability. I also supervise a number of PhD / MSc students and teach courses on search engines and applied machine learning. I am the author and maintainer of yari:mtx -- a swiss-army knife for large datasets.

Prior to Edinburgh, I directed AEnalytics -- a consulting firm in St.Petersburg, Russia. We built one of the world's first news-based arbitrage systems for Credit Suisse. We monitored half-a-million news stories per day, looking for events that may affect stock prices, and continuously re-balancing the portfolio. The system was based on my earlier work on AEnalyst.

Before that, I was a PhD student, and later a post-doc at UMass Amherst, working with Bruce Croft, James Allan, and many others. I worked on a number of projects, but my primary contributions came in three areas: (1) I invented relevance models: a pretty-good formula for query expansion, which to this day wins TREC competitions; (2) I developed a family of highly-cited methods for predicting tags for images; and (3) I built the UMass Topic Detection and Tracking system, which was among the best in DARPA's TDT competitions.

Generative Approaches to Modeling Relevance.
° A Generative Theory of Relevance (Springer 2009)
° Relevance Models in Information Retrieval (book chapter)
° Optimal Mixture Models in IR (ECIR 2002) best student paper
° Relevance-Based Language Models (SIGIR 2001)
° Relevance Feedback and Personalization: A Language Modeling Perspective
° Localized Smoothing for Multinomial Language Models (tech. report)
° Formal Multiple-Bernoulli Models for Language Modeling (SIGIR 2004)

Activity Modeling.
I have collaborated with Anton Leuski on applying relevance modeling to the task of modeling human actions in a social environment. Our goal is to analyze communications between the participants and pinpoint messages relevant to certain collaborative activities. One example of such activity could be a group of players in an online role-playing game organizing a raid on a hostile castle. Our approach involves constructing a joint distribution of message content and relevant actions taken by the sender and recipient after communicating.
° Tracking Dragon-Hunters with Language Models (CIKM 2006)
° Role Detection in Vurtual Worlds (Carsten Eickhoff's MSc thesis)

Handwriting Recognition and Retrieval. In a joint work with Toni Rath and R. Manmatha, I adapted the relevance modeling framework to the problem of searching collections of highly-degraded handwritten documents. Our approach relied on a joint model of word shape and word meaning and was the first successful solution to this challenging problem.
° Holistic Word Recognition for Handwritten Historical Documents (DIAL 2004)
° A Search Engine for Historical Manuscript Images (SIGIR 2004)
° Retrieving Historical Manuscripts using Shape (CIIR tech. report)

Automatic Image and Video Annotation I have collaborated with R.Manmatha, Jiwoon Jeon and Shaolei Feng to extend the relevance modeling framework to include real-valued variables, such as feature functions used in computer vision. Our research resulted in a highly accurate method for automatically assigning keywords to unlabeled photographs and video segments. The algorithm currently represents the best-performing way for content-based searching of unlabeled images.
° Automatic Image Tagging (Sean Moran's MSc thesis)
° Multiple Bernoulli Relevance Models for Image and Video Annotation (CVPR 2004)
° Statistical Models for Automatic Video Annotation and Retrieval (ICASSP 2004)
° A Model for Learning the Semantics of Pictures, (NIPS 2003)
° Image Annotation and Retrieval using Cross-Media Relevance Models (SIGIR 2003)

AEnalyst (from e-Analyst) is a market-forecasting technology that combines advances in the fields of Information Retrieval and Time Series Analysis. Ænalyst uses piecewise regression to identify trends in stock prices and employs language modeling techniques to associate trends with content of news stories.
° Mining of Concurrent Text and Time Series (KDD 2000)
° Mining of Concurrent Text and Time Series (full version)
° Language Models for Financial News Recommendation (CIKM 2000)
° Electronic Analyst of Stock Behavior (CIIR tech. report)

Cross-language Information Retrieval. Together with Martin Choquette and Bruce Croft, I have extended relevance modeling to cross-language retrieval, where an English query is used to find relevant documents in Chinese. The algorithm relies on a parallel corpus to estimate a joint distribution of English and Chinese word sets, which is used to model the user's information need. The algorithm is significantly more accurate than approaches based on a dictionary or on machine translation.
° Cross-Lingual Relevance Models (SIGIR 2002)

Bounds in Stochastic Processes.
° First Story Detection in TDT Is Hard (CIKM 2000)
° A Mathematical Model of Vocabulary Growth (tech. report)
° Comparing Effectiveness in TDT and IR
° Detections, Bounds, and Timelines: UMass and TDT-3

Topic Detection and Tracking.
° A Month to Topic Detection and Tracking in Hindi (TOIS 2003)
° Explorations within Topic Tracking and Detection (Kluwer 2002)
° Language-specific Models in Multilingual Topic Tracking (SIGIR 2004)
° Relevance Models for Topic Detection and Tracking (HLT 2002)
° Monitoring the News: a TDT demonstration system (HLT 2001)
° On-line New Event Detection and Tracking (SIGIR 1998)
° UMass TDT 2003 Research Summary
° UMass at TDT 2002
° UMass at TDT 2000
° UMASS Approaches to Detection and Tracking at TDT2
° Topic-Based Novelty Detection
° Event Tracking