I am sometimes mistaken for Joan Carletta - if you're interested in VLSI or fast signal processing, try her.
AMI Meeting Corpus
The AMI Meeting Corpus is a fairly new data set containing 100 hours of annotated multimodal meeting recordings. It was released under a Creative Commons ShareAlike License in June 2006. It is aimed at a wide audience from speech and video processing to organizational psychologists and linguists. There is an auxiliary data set, the AMIDA Meeting Corpus, that contains a smaller amount of data that is similar, but where one person is collaborating with a face-to-face group from a remote location. The projects that produced them are described on the AMI Project website. As well as numerous individual projects, the new European "network of excellence" in social signal processing plans to use it for some of their work.
The NITE XML Toolkit
The NITE XML Toolkit is open source software that supports the development and analysis of multimodal language corpora. Using a data model that allows annotations to relate structurally and temporally, it provides library functions (in Java) for data handling, query (using a language designed to match the data model), and interface components. It comes with a number of configurable end user interfaces for common tasks like dialogue act and named entity annotation. Although many of its features relate to signals, some people use it on text corpora to support unusual annotations or several kinds of annotation at once.
HCRC Map Task Corpus
The HCRC Map Task Corpus is quite old now, but people still find it useful because it's one of the few dialogue corpora to have a wide range of annotations all available in one place. The website includes an NXT format release of all of the existing annotations and the audio, which was previously available only on CD.
Switchboard Corpus in NXT Format
As part of supporting a set of projects at Edinburgh that were all using the Switchboard Corpus, we've pulled together as many Switchboard annotations as we could find and put them into NXT format, as well as authoring a few of our own. The Linguistic Data Consortium is distributing them under a Creative Commons Share-Alike license, which means that anyone who has them is free to distribute them under the same terms. Local Informatics users can find them at /group/corpora/public/switchboard/nxt/.There is a website describing the annotations and how to use them, with contact details for anyone with questions. There is also a recent (2010) journal paper about the data release. Please get in touch if you have other annotations you want to contribute, especially if they arise under the "Share-Alike" license condition. Where possible, we intend to add them to the current data set so that everything can be distributed together.
Small Group Discussion
I only started thinking about tools and corpora because I needed them for my own research on dialogue and small group discussion. For a gentle introduction to my work in these areas, see my paper for the Japanese Cognitive Science Society.