Now available: ViMELF - A Corpus of Video-Mediated English as a Lingua Franca Conversations

We are happy to announce the release of ViMELF, a corpus of 20 video-mediated English as a Lingua Franca conversations. The corpus is freely available for non-commercial research purposes - more information on access Opens internal link in current windowhere.

ViMELF contains 20 Skype conversations between 40 speakers from Germany (20 speakers), Spain (5), Italy (5), Finland (5), and Bulgaria (5), totaling 744.5 minutes (ca. 12.5 hours), with an average conversation length of 37.23 minutes. The corpus comprises 113 670 words in the plain text version and 152 472 items in the annotated version.

The transcripts are available as .docx and .txt files; the videos in MPEG4 format. Several versions are available: the fully annotated pragmatic version as text and XML, a lexical version, and a POS-tagged version (auto-tagged with the Opens external link in new windowCLAWS C7 tagset).

More information on ViMELF Opens internal link in current windowhere.

The CASE project

Compiling a corpus of informal English as a Lingua Franca (ELF) conversations

The CASE project was started in 2012 at Saarland University with the aim of collecting video-mediated conversations in an international English-language context and thus create a dataset or "corpus" that allows research of this particular communication type. Until 2018,  teams of researchers from Germany, Bulgaria, Spain, Italy, Sweden, Finland, France, Belgium, the UK and the US have compiled more than 250 hours of conversations using Skype as a medium. The conversations are first encounters between two participants from different countries and last between 30 and 60 minutes.

Of particular interest to us are pragmatics and discourse in a video-mediated communication setting, cultural and intercultural negotiation, issues of identity, the role of plurilingual resources, and the influence of the communication medium on issues such as rapport and cooperation in an international setting. 

Conversations are transcribed according to pragmatic transcription guidelines, with the aim of allowing for a wide range of applications and in particular focusing on spoken language features, multimodality, and the use of plurilingual resources. Our team of researchers has published several papers on various aspects of the project, some of which are available online. A detailed description of the issues related to the analysis of spoken data with extensive examples can be found in

  • Brunner, Marie-Louise; Stefan Diemer; and Selina Schmidt. 2017. “... okay so good luck with that ((laughing))?” - Managing rich data in a corpus of Skype conversations. Studies in Variation, Contacts and Change in English 19 [Big and Rich Data in English Corpus Linguistics: Methods and explorations, ed. by Turo Hiltunen; Joe McVeigh; and Tanja Säily]. Helsinki: Varieng. Full text Opens external link in new windowhere. [Opens external link in new windowhttp://www.helsinki.fi/varieng/series/volumes/19/brunner_diemer_schmidt/].

The recordings have been completed in 2018 with a total of more than 250 hours of data. The raw data has been used for various qualitative studies. 

  • CASE. 2018. Corpus of Academic Spoken English – Recordings. Birkenfeld: Trier University of Applied Sciences. [Opens external link in new windowhttp://umwelt-campus.de/case].

While the CASE project is still ongoing, several preliminary datasets have been analyzed and discussed in our publications. A preliminary set of 20 conversations, BabyCASE was compiled in 2017, as well as two sets of conversations about food in 2015 and 2017. Preliminary transcripts of additional single conversations are also available.

  • BabyCASE. 2017. Birkenfeld: Trier University of Applied Sciences & Saarbrücken: Saarland University. [Opens external link in new windowhttp://umwelt-campus.de/case]. (20 conversations from the CASE project)
  • FoodCASE 2015. Birkenfeld: Trier University of Applied Sciences & Saarbrücken: Saarland University. [Opens external link in new windowhttp://umwelt-campus.de/case]. (21 conversations about food from the CASE project)
  • FoodCASE v2 2017. Birkenfeld: Trier University of Applied Sciences. [Opens external link in new windowhttp://umwelt-campus.de/case]. (22 conversations about food from the CASE project)

In May 2018, ViMELF, the first finalized corpus based on data from the CASE project was released.

Citations

Citing ViMELF - A Corpus of Video-Mediated English as a Lingua Franca Conversations:

ViMELF. 2018. Corpus of Video-Mediated English as a Lingua Franca Conversations. Birkenfeld: Trier University of Applied Sciences. Version 1.0. The CASE project [umwelt-campus.de/case]. (date of last access). 

Citing the CASE project: 

Long citation:

The CASE project. 2012-2018. Stefan Diemer; Marie-Louise Brunner; Caroline Collet; and Selina Schmidt. Birkenfeld: Trier University of Applied Sciences (coordination) / Saarbrücken: Saarland University / Sofia: St Kliment Ohridski University / Forlì: University of Bologna-Forlì / Santiago: University of Santiago de Compostela / Helsinki: Helsinki University & Hanken School of Economics / Birmingham: Birmingham City University / Växjö: Linnaeus University / Lyon: Université Lumière Lyon 2 / Louvain-la-Neuve: Université catholique de Louvain / Boise: Boise State University. [http://umwelt-campus.de/case] (date of last access).

Short citation:

The CASE project. 2012-2018. Birkenfeld: Trier University of Applied Sciences. [http://umwelt-campus.de/case] (date of last access).

Citing preliminary single transcripts:

Transcript 01SB00SF00 (preliminary). 2017. The CASE project. 2012-2018. Birkenfeld: Trier University of Applied Sciences. [http://umwelt-campus.de/case] (date of last access).

Now available: ViMELF

ViMELF – A Corpus of Video-Mediated English as a Lingua Franca Conversations.

Project Coordination & Contact: 

Stefan Diemer & Marie-Louise Brunner

Trier University of Applied Sciences, Germany

More information Opens internal link in current windowhere.