OLAC Language Resource Catalog

Navigation Aids

OLAC Language Resource Catalog
Search for language resources
 

Main Content

CSLU: 22 Languages Corpus
Title:
CSLU: 22 Languages Corpus
ID:
LDC2005S26
https://catalog.ldc.upenn.edu/LDC2005S26
ISBN: 1-58563-356-9
Online:
Yes
Archive:
Date:
2005
Publisher:
Linguistic Data Consortium
https://www.ldc.upenn.edu
Description:
*Introduction* This file contains documentation on the CSLU: 22 Languages v 1.2, Linguistic Data Consortium (LDC) catalog number LDC2005S26 and ISBN 1-58563-361-5. Produced by Center for Spoken Language Understanding and distributed by the Linguistic Data Consortium, the 22 Languages corpus consists of telephone speech from 21 languages: Eastern Arabic, Cantonese, Czech, Farsi, German, Hindi, Hungarian, Japanese, Korean, Malay, Mandarin, Italian, Polish, Portuguese, Russian, Spanish, Swedish, Swahili, Tamil, Vietnamese, and English. The corpus contains fixed vocabulary utterances (e.g. days of the week) as well as fluent continuous speech. Each of the 50,191 utterances is verified by a native speaker to determine if the caller followed instructions when answering the prompts. For this release, approximately 19,758 utterances have corresponding orthographic transcriptions in all the above languages except Eastern Arabic, Farsi, Korean, Russian, Italian. *Samples* For an exampe of this corpus, please listen to these Arabic and English audio samples. *Updates and Contact* Questions regarding this corpus and about the Center for Spoken Language Understanding should be directed to Jan van Santen.
Content language:
Yue Chinese
Vietnamese
Tamil
Swahili (individual language)
Swedish
Russian
Portuguese
Polish
Korean
Japanese
Indonesian
Hindi
English
German
Arabic
Swahili (macrolanguage)
Congo Swahili
Spanish
Mandarin Chinese
Italian
Hungarian
Persian
Dari
Iranian Persian
Czech
Linguistic type:
Primary text
DCMI type:
Sound
Other format:
Sampling Rate: 8000
Sampling Format: ulaw
Distribution: Web Download
Other language:
Yue Chinese
Vietnamese
Tamil
Swahili (individual language)
Swedish
Russian
Portuguese
Polish
Korean
Japanese
Indonesian
Hindi
English
German
Arabic
Swahili
Congo Swahili
Spanish
Mandarin Chinese
Italian
Hungarian
Persian
Dari
Iranian Persian
Czech
Other rights:
Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
CSLU Agreement: https://catalog.ldc.upenn.edu/license/cslu-corpora-non-commercial-research-only.pdf
Rights holder: Portions © 1998-2002 Center for Spoken Language Understanding Oregon Health & Science University, © 2005 Trustees of the University of Pennsylvania
Complete OLAC record:
Link for this page:

Find Related Information:

Archive: The LDC Corpus Catalog
Online: Yes
Linguistic type: Primary text
DCMI type: Sound
Content language: Arabic
Content language: Congo Swahili
Content language: Czech
Content language: Dari
Content language: English
Date: 2000 - 2009
Date: 2000 and later
Contributor: Lander, T
Publisher: Linguistic Data Consortium
Publisher: https://www.ldc.upenn.edu
Title: CSLU: 22 Languages Corpus
Other format: Distribution: Web Download
Other format: Sampling Format: ulaw
Other format: Sampling Rate: 8000
Other language: Arabic
Other language: Congo Swahili
Other language: Czech
Other language: Dari
Other language: English
Other rights: CSLU Agreement: https://catalog.ldc.upenn.edu/license/cslu-corpora-non-commercial-research-only.pdf
Other rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Other rights: Rights holder: Portions © 1998-2002 Center for Spoken Language Understanding Oregon Health & Science University, © 2005 Trustees of the University of Pennsylvania