Compiling a corpus of informal English as a Lingua Franca (ELF) conversations
The CASE project was started in 2012 at Saarland University with the aim of collecting video-mediated conversations in an international English-language context and thus create a dataset or "corpus" that allows research of this particular communication type. Until 2018, teams of researchers from Germany, Bulgaria, Spain, Italy, Sweden, Finland, France, Belgium, the UK and the US have compiled more than 250 hours of conversations using Skype as a medium. The conversations are first encounters between two participants from different countries and last between 30 and 60 minutes.
Of particular interest to us are pragmatics and discourse in a video-mediated communication setting, cultural and intercultural negotiation, issues of identity, the role of plurilingual resources, and the influence of the communication medium on issues such as rapport and cooperation in an international setting.
Conversations are transcribed according to pragmatic transcription guidelines, with the aim of allowing for a wide range of applications and in particular focusing on spoken language features, multimodality, and the use of plurilingual resources. Our team of researchers has published several papers on various aspects of the project, some of which are available online. A detailed description of the issues related to the analysis of spoken data with extensive examples can be found in
Brunner, Marie-Louise; Stefan Diemer; and Selina Schmidt. 2017. “... okay so good luck with that ((laughing))?” - Managing rich data in a corpus of Skype conversations. Studies in Variation, Contacts and Change in English 19 [Big and Rich Data in English Corpus Linguistics: Methods and explorations, ed. by Turo Hiltunen; Joe McVeigh; and Tanja Säily]. Helsinki: Varieng. Full text here. [http://www.helsinki.fi/varieng/series/volumes/19/brunner_diemer_schmidt/].
CASE project recordings have been completed in 2018 with a total of more than 250 hours of data. The raw data has been used for various qualitative studies.
While the CASE project is still ongoing, several preliminary datasets have been analyzed and discussed in our publications. A preliminary set of 20 conversations, BabyCASE was compiled in 2017, as well as two sets of conversations about food in 2015 and 2017. Preliminary transcripts of additional single conversations are also available.
In May 2018, the first finalized corpus based on data from the CASE project was released for scientific use: ViMELF.
A Corpus of Video-Mediated English as a Lingua Franca Conversations
ViMELF contains 20 Skype conversations between 40 speakers from Germany (20 speakers), Spain (5), Italy (5), Finland (5), and Bulgaria (5), totaling 744.5 minutes (ca. 12.5 hours), with an average conversation length of 37.23 minutes. The corpus comprises 113 677 words in the plain text version and 152 467 items in the annotated (preliminary numbers).
The transcripts are available as .docx and .txt files; the videos in MPEG4 format. Several versions are available: the fully annotated pragmatic version as text and XML, a lexical version, and a POS-tagged version (auto-tagged with CLAWS).
Citing ViMELF - A Corpus of Video-Mediated English as a Lingua Franca Conversations:
ViMELF. 2018. Corpus of Video-Mediated English as a Lingua Franca Conversations. Birkenfeld: Trier University of Applied Sciences. [http://umwelt-campus.de/case] (date of last access).
Citing the CASE project:
The CASE project. 2012-2018. Stefan Diemer; Marie-Louise Brunner; Caroline Collet; and Selina Schmidt. Birkenfeld: Trier University of Applied Sciences (coordination) / Saarbrücken: Saarland University / Sofia: St Kliment Ohridski University / Forlì: University of Bologna-Forlì / Santiago: University of Santiago de Compostela / Helsinki: Helsinki University & Hanken School of Economics / Birmingham: Birmingham City University / Växjö: Linnaeus University / Lyon: Université Lumière Lyon 2 / Louvain-la-Neuve: Université catholique de Louvain / Boise: Boise State University. [http://umwelt-campus.de/case] (date of last access).
The CASE project. 2012-2018. Birkenfeld: Trier University of Applied Sciences. [http://umwelt-campus.de/case] (date of last access).
Citing preliminary single transcripts:
Transcript 01SB00SF00 (preliminary). 2017. The CASE project. 2012-2018. Birkenfeld: Trier University of Applied Sciences. [http://umwelt-campus.de/case] (date of last access).