This experimental page presents an LDA topic analysis of around 9000 PDFs, published on, into three topics. Each topic is a probability distribution on words. Each document is represented as a mixture of the three topics, giving barycentric coordinates that place it, as a dot, in the triangle below. The topics arise out of the data, as a best-effort to account for the variations in word frequencies between documents using this topic model.

Enter space-separated lists of UUNs (or more precisely, basenames of your directories in the /public/homepages/ - e.g. rbf bundy wadler) in the text boxes to highlight documents published by these people or groups; each box has its own colour. Changes appear onchanged, i.e. after you click outside the text area, so that it loses focus.

Not everything published on homepages is our work - but it probably represents our interests. Not everyone publishes on homepages, so for some UUNs you will find no entries. I plan to extend coverage - suggestions of other data sources we should mine are welcome (html pages and library repository are already on my list).

The pure topics - I've named them systems, interaction, and theory; but these are just names, suggestions for alternative names are welcome - are represented by the three corners of the triangle. The modal words for each topic are grouped nearby, with font-size proportional to relative weight within that topic (note that this exaggerates differences; font-size should be proportional to sqrt(weight) - maybe later ...).


agent based corpus data information language model number results set speech system systems text time user word words work


analysis based data figure learning model models network number performance process results set space state system systems time
informatics in perspective


case class data form function language logic model order proof query section semantics set system theorem type types xml

The LDA analysis was done using MALLET.