Evaluation Methods for Topic Models

Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov and David Mimno.

A natural evaluation metric for statistical topic models is the probability of held-out documents given a trained model. While exact computation of this probability is intractable, several estimators for this probability have been used in the topic modeling literature, including the harmonic mean method and empirical likelihood method. In this paper, we demonstrate experimentally that commonly-used methods are unlikely to accurately estimate the probability of held-out documents, and propose two alternative methods that are both accurate and efficient.

Proceedings of the 26th International Conference on Machine Learning (ICML), 2009. [PDF, DjVu, GoogleViewer, BibTeX]

You can also download some associated Matlab code. Some Java code associated with this paper is now part of MALLET.

The above code release includes fixes for bugs in the Chib-style implementation reported by Matthew Willson. The buggy original version used for the paper is available for reference. That version is still consistent for long Markov chains but is not expected to work as well. See the README in the new release for details. As a result of this mistake, we probably under-reported the performance of the Chib-style method.

We have now also released most of the data used in the paper. You could download everything with a command like:

wget -erobots=off -r -np -p -k -nH --cut-dirs=4 https://homepages.inf.ed.ac.uk/imurray2/pub/09etm/data/