Welcome   About eScribe   For linguists   For users

eScribe for Linguists

Why we need corpora
Why we need eScribe
What eScribe gives us
A new way of building corpora
Beyond language acquisition
Why we need corpora

We are developing eScribe to facilitate native speaker participation in creating corpora of child directed and child produced speech, enabling people to more easily create corpora of less commonly spoken languages. These large collections of naturally produced linguistic data are important resources, not only for language acquisition researchers, but also for linguists studying language typology, theoretical linguistics, language processing and computational linguistics. They can be used to study what input children acquiring a given language are exposed to, what words and constructions child produce throughout development, and what kinds of errors both adult and child speakers make. Moreover, they can serve as linguistic and cultural records for speakers of the languages being studied.


Why we need eScribe

Most of the corpora available are of commonly spoken languages. The reasons for this are straightforward - with more speakers of a language, corpora are easier to collect, and, crucially easier to transcribe. In order for a corpus to be useful to a wide range of researchers, it needs to be digitally transcribed. Transcription of naturally occurring speech is most easily and reliably done by native speakers of the language. For these reasons (availability of speakers to record and speakers to transcribe or assist with transcription) most of the available corpora are from commonly spoken languages. With eScribe we endeavor to create and transcribe corpora of less commonly studied languages, not only for their value to linguistic researchers, but for their value to the communities of speakers of these languages. As many less commonly spoken languages are endangered or may become endangered, future generations of children may not be acquiring them. It is thus important to gather this data now, while it is available, both to provide data for researchers and to create a rich record of these languages as they are used in day to day life.


What eScribe gives us

With fewer speakers and few to no native speaker linguists in most less commonly studied languages, creating corpora is a difficult task. Speakers may be hard to reach geographically, and transcription requires a combination of close side by side work with native speakers and linguists, as well as potentially difficult training for the native speaker assistants. Many speakers of these languages may not be literate in their native language, and/or may not be comfortable typing. Modeled after the Aikuma, eScribe is designed to be usable with very little technical knowledge, non anglocentric, and no written knowledge of the language being transcribed is required. We do this by making two options for transcription: traditional text transcription option and a 'respeaking' option. The respeaking option allows speakers to listen to naturally recorded speech, and rerecord it in a slow clear voice. These recordings can in turn be text transcribed by either native speakers who may not have been present when the recording was made, or even by non native speakers with working knowledge of the language. Hopefully, eScribe will enable linguists to partner with more native speakers of more language to produce corpora of child directed and child produced speech, as well as corpora and stories spoken by adult speakers.


A new way of building corpora

As eScribe allows us to engage native speakers in the collection and transcription of corpora, we can look at corpus collection in new ways. We are no longer limited to interactions between parents and children that happen at times when the linguist or experimenter stops by to make a recording. With eScribe, parents can record snippets of conversation throughout the day, potentially giving us a broader sample of situations and the language used in these situations. Many short recordings may prove to be a more manageable corpus to transcribe as well - speakers assisting us may not have the time to sit down and respeak or transcribe an hour long session, but going through and respeaking one or two minutes of recording on any given day could be feasible.


Beyond language acquisition

We developed eScribe because we were interested in language acquisition, and wanted to have rich resources to study the acquisition of understudied languages, much as we do with more commonly studied ones. However, the tool we have developed is a versatile one. eScribe's capabilities make it well suited to record and respeak, transcribe or annotate both naturalistic and elicited adult speech, from fieldwork sessions to storytelling.