CSTR NAM TIMIT Plus corpus
This CSTR NAM TIMIT Plus corpus includes a parallel whispered speech recorded simultaneously via a non-audible murmu (NAM) microphone which uses urethane-elastomer to create a close contact with the skin and an omni-directional headset-mounted condenser microphone (a DPA 4035).
The NAM microphone is a kind of special microphone which can be used as the sensing device of a silent speech interfaces (SSI) system, where alternative signal can be acquired without the user speaking in the normal way. The NAM microphone is a special body-conductive microphone (See Nakajima et al., ICASSP 2003 and Toda et al., ICASSP 2009). It can be used to detect extremely quiet speech (NAM), that even listeners around the speaker can hardly hear. NAM speech tends to be unvoiced, like whispering. The best position to place the NAM microphone is just behind the ear. It can be used to detect various kinds of speech, including whispering and normal speech, conducted through the soft tissue of the head. It is more robust to environmental noise than an ordinary microphone. Compared to other kinds of SSI systems, which may involve electrodes or other sensing devices, a NAM microphone-based SSI system is non-intrusive, cheap and convenient.
The corpus comprises 420 sentences, which were selected from newspaper text, 460 sentences, which were selected from the TIMIT texts, and 18 isolated words, which were aimed for an open-source voice command recogniser ``kiku'', uttered by a young female speaker. The 420 newspaper texts were randomly taken from Herald Glasgow, with permission from Herald & Times Group. The TIMIT texts are identical those of the MOCHA-TIMIT corpus (http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html)
The newspaper recording comprises of two sections: one recorded in clean conditions and the other one in pre-recorded cafeteria noise played over a loudspeaker at 65 dB [A] (resulting in and SNR of approximately 10 dB). The timit and isolated word recordings were conducted only in the noise condition. Both sections of the corpus were recorded in a soundproof hemi-anechoic chamber (noise floor around 25 dB [A]) at 96kHz sampling rate and 24 bit sample depth into a Pro Tools HD system.