Romanian speech synthesis (RSS) corpus

The Romanian speech synthesis (RSS) corpus is a free large-scale Romanian speech corpus that includes about 3000 sentences uttered by a native female speaker. The RSS corpus was designed mainly for text-to-speech synthesis and was recorded in a hemianechoic chamber (anechoic walls and ceiling; floor partially anechoic) at the University of Edinburgh. We used three high quality studio microphones: a Neumann u89i (large diaphragm condenser), a Sennheiser MKH 800 (small diaphragm condenser with very wide bandwidth) and a DPA 4035 (headset-mounted condenser). Although the current release includes only speech data recorded via Sennheiser MKH 800, we may release speech data recorded via other microphones in the future. All recordings were made at 96 kHz sampling frequency and 24 bits per sample, then downsampled to 48 kHz sampling frequency. For recording, downsampling and bit rate conversion, we used ProTools HD hardware and software. We conducted 8 sessions over the course of a month, recording about 500 sentences in each session. At the start of each session, the speaker listened to a previously recorded sample, in order to attain a similar voice quality and intonation.