OLAC Language Resource Catalog

Navigation Aids

OLAC Language Resource Catalog
Search for language resources
 

Main Content

Structured, tokenized and tagged data from Infral's blogs
Title:
Structured, tokenized and tagged data from Infral's blogs
ID:
mce-infral-tagged_blogs
Link to the object:
Online:
Yes
Archive:
Contributor:
Laurent Mario (compiler)
Laurent Mario (author)
Laurent Mario ; Chanier Thierry (researcher)
Laurent Mario (data inputter)
Chanier Thierry (depositor)
Chanier Thierry (compiler)
Chanier Thierry (editor)
Mario Laurent (author)
Laurent Mario ; Chanier Thierry (author)
Date:
2011-12-02
Publisher:
Mulce (MULtimodal Corpus Exchange) ; Universite Blaise Pascal ; Clermont-Ferrand:France ; URL:http://mulce.org
Description:
This corpus is based on data extracted from the global Learning & Teaching Corpus Infral archived in the data repository Mulce :
It was created by Mario Laurent based on his Masters' project carried out in Laboratoire de Recherche sur le Langage, Université Blaise Pascal, Clermont-Ferrand.
Structuring language interactions into exploitable corpora is necessary to analyze the data from the Infral project. To understand the development of intercultural competences we have to quantify the production of the different participants, such as language use or lexical diversity. In order to achieve this, we used Python programming language and the NLTK library. During the Infral course, participants from a French and a German university communicated using both languages via blogs. We developed a program that converts plain text from Infral's blogs into a structured XML file where each message is tokenized into words. Each word is tagged according to its form and its original language.
Content language:
French
German
Subject language:
French
Language family:
Indo-European
Italic
Romance
Country:
Germany
France
Linguistic type:
Primary text
Linguistic field:
Applied linguistics
Discourse analysis
Text and corpus linguistics
Discourse type:
Dialogue
Narrative
DCMI type:
Dataset
Collection
Format:
text/xml
application/pdf
LCSH subject:
Education
Data processing
Computer-assisted instruction
Language and languages
Study and teaching
Temporal coverage:
name=Infral course ; start=2008-09-29; end=2009-01-09
name=Master Project ; start=2011-03-01; end=2011-30-06
Other rights:
http://lrl-diffusion.univ-bpclermont.fr/mulce/metadata/vdex/mce_licence.xml
Rights holders of this corpus are: Thierry Chanier ; Dagmar Abendroth-Timmer; Maud Ciekanski ; Mark Bechtel ; Laurent Mario ; licence = http://creativecommons.org/licenses/by-nc-sa/2.0/
open access after registration
Other subject:
NLP; XML; telecollaboration ; intercultural; online teaching
Complete OLAC record:
Link for this page:

Find Related Information:

Archive: Multimodal Learning and teaching Corpora Exchange
Online: Yes
Subject language: French
Language family: Indo-European
Language family: Italic
Language family: Romance
Geographic region: Europe
Country: France
Country: Germany
Linguistic type: Primary text
Linguistic field: Applied linguistics
Linguistic field: Discourse analysis
Linguistic field: Text and corpus linguistics
Discourse type: Dialogue
Discourse type: Narrative
DCMI type: Collection
DCMI type: Dataset
Format: application/pdf
Format: text/xml
Content language: French
Content language: German
Date: 2000 and later
Date: 2010 - 2019
Contributor: Chanier Thierry
Contributor: Laurent Mario
Contributor: Laurent Mario ; Chanier Thierry
Contributor: Laurent Mario ; Chanier Thierry
Contributor: Mario Laurent
LCSH subject: Computer-assisted instruction
LCSH subject: Data processing
LCSH subject: Education
LCSH subject: Language and languages
LCSH subject: Study and teaching
Publisher: Mulce (MULtimodal Corpus Exchange) ; Universite Blaise Pascal ; Clermont-Ferrand:France ; URL:http://mulce.org
Temporal coverage: name=Infral course ; start=2008-09-29; end=2009-01-09
Temporal coverage: name=Master Project ; start=2011-03-01; end=2011-30-06
Title: Structured, tokenized and tagged data from Infral's blogs
Other rights: Rights holders of this corpus are: Thierry Chanier ; Dagmar Abendroth-Timmer; Maud Ciekanski ; Mark Bechtel ; Laurent Mario ; licence = http://creativecommons.org/licenses/by-nc-sa/2.0/
Other rights: http://lrl-diffusion.univ-bpclermont.fr/mulce/metadata/vdex/mce_licence.xml
Other rights: open access after registration
Other subject: NLP; XML; telecollaboration ; intercultural; online teaching