OLAC Language Resource Catalog

Navigation Aids

OLAC Language Resource Catalog
Search for language resources
 

Main Content

2008 NIST Speaker Recognition Evaluation Training Set Part 2
Title:
2008 NIST Speaker Recognition Evaluation Training Set Part 2
ID:
LDC2011S07
https://catalog.ldc.upenn.edu/LDC2011S07
ISBN: 1-58563-591-X
ISLRN: 956-489-013-269-1
OAI: oai:www.ldc.upenn.edu:LDC2011S07
Online:
Yes
Archive:
Date:
2011
Publisher:
Linguistic Data Consortium
https://www.ldc.upenn.edu
Description:
*Introduction* 2008 NIST Speaker Recognition Evaluation Training Set Part 2, Linguistic Data Consortium (LDC) catalog number LDC2011S07 and ISBN 1-58563-591-X , was developed by LDC and NIST (National Institute of Standards and Technology). It contains 950 hours of multilingual telephone speech and English interview speech along with transcripts and other materials used as training data in the 2008 NIST Speaker Recognition Evaluation (SRE). SRE is part of an ongoing series of evaluations conducted by NIST. These evaluations are an important contribution to the direction of research efforts and the calibration of technical capabilities. They are intended to be of interest to all researchers working on the general problem of text independent speaker recognition. To this end the evaluation is designed to be simple, to focus on core technology issues, to be fully supported, and to be accessible to those wishing to participate. The 2008 evaluation was distinguished from prior evaluations, in particular those in 2005 and 2006, by including not only conversational telephone speech data but also conversational speech data of comparable duration recorded over a microphone channel involving an interview scenario. Additional documentation is available at the NIST web site for the 2008 SRE and within the 2008 SRE Evaluation Plan. *Data* The speech data in this release was collected in 2007 by LDC at its Human Subjects Data Collection Laboratories in Philadelphia and by the International Computer Science Institute (ICSI) at the University of California, Berkeley. This collection was part of the Mixer 5 project, which was designed to support the development of robust speaker recognition technology by providing carefully collected and audited speech from a large pool of speakers recorded simultaneously across numerous microphones and in different communicative situations and/or in multiple languages. Mixer participants were native English speakers and bilingual English speakers. The telephone speech in this corpus is predominately English, but also includes the above languages. All interview segments are in English. Telephone speech represents approximately 523 hours of the data, and microphone speech represents the other 427 hours. The telephone speech segments include summed-channel excerpts in the range of 5 minutes from longer original conversations. The interview material includes single channel conversation interview segments of at least 8 minutes from a longer interview session. As in prior evaluations, intervals of silence were not removed. English language transcripts in .cfm format were produced using an automatic speech recognition (ASR) system. There are approximately six files distributed as part of SRE08 where each file is a 1024 byte header with no audio. However, these files were not included in the trials or keys distributed in the SRE08 aggregate corpus. *Samples* For an example of the data contained in this corpus, review this audio sample.
Content language:
Yue Chinese
Wu Chinese
Vietnamese
Uzbek
Urdu
Tigrinya
Thai
Tagalog
Spanish
Russian
Panjabi
Min Nan Chinese
Lao
Korean
Central Khmer
Georgian
Japanese
Italian
Hindi
Persian
English
Mandarin Chinese
Bengali
Egyptian Arabic
Moroccan Arabic
Northern Khmer
Dari
Iranian Persian
Chinese
Arabic
DCMI type:
Sound
Other format:
Sampling Rate: 8000
Sampling Format: ulaw
Distribution: DVD
Other language:
Yue Chinese
Wu Chinese
Vietnamese
Uzbek
Urdu
Tigrinya
Thai
Tagalog
Spanish
Russian
Panjabi
Min Nan Chinese
Lao
Korean
Central Khmer
Georgian
Japanese
Italian
Hindi
Persian
English
Mandarin Chinese
Bengali
Egyptian Arabic
Moroccan Arabic
Northern Khmer
Dari
Iranian Persian
Chinese
Arabic
Other rights:
Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Rights holder: Portions © 2007, 2011 Trustees of the University of Pennsylvania
Complete OLAC record:
Link for this page:

Find Related Information:

Archive: The LDC Corpus Catalog
Archive: Graduate Institute of Applied Linguistics Library
Archive: The Rosetta Project: A Long Now Foundation Library of Human Language
Online: Yes
Online: No
Subject language: Central Khmer
Subject language: English
Language family: Austro-Asiatic
Language family: Germanic
Language family: Indo-European
Language family: Mon-Khmer
Geographic region: Asia
Geographic region: Europe
Linguistic type: Lexicon
DCMI type: Sound
DCMI type: Text
Format: application/pdf
Format: image/gif
Content language: Central Khmer
Content language: English
Content language: Arabic
Content language: Bengali
Content language: Chinese
Date: 2000 and later
Date: 2010 - 2019
Date: 1950 - 1999
Date: 1970 - 1979
Date: 2000 - 2009
Contributor: NIST Multimodal Information Group
Contributor: Cambodia Documentation Commission
Contributor: Keller, Sally E
Contributor: Summer Institute of Linguistics. University of North Dakota Session
Contributor: The Long Now Foundation
LCSH subject: English language--Dictionaries--Khmer
LCSH subject: Khmer language--Cambodia
LCSH subject: Medicine--Dictionaries
Publisher: Linguistic Data Consortium
Publisher: https://www.ldc.upenn.edu
Publisher: Cambodia Documentation Commission, Cambodia
Publisher: Grand Forks, N.D. : Summer Institute of Linguistics, University of North Dakota Session
Title: 2008 NIST Speaker Recognition Evaluation Training Set Part 1
Title: 2008 NIST Speaker Recognition Evaluation Training Set Part 2
Title: English-Khmer medical dictionary
Title: Universal Declaration of Human Rights
Other coverage: Cambodia
Other format: Distribution: DVD
Other format: Sampling Format: ulaw
Other format: Sampling Rate: 8000
Other format: Abbyy GZ
Other format: Animated GIF
Other language: Arabic
Other language: Bengali
Other language: Central Khmer
Other language: Chinese
Other language: Dari
Other rights: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Other rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Other rights: Rights holder: Portions © 2007, 2011 Trustees of the University of Pennsylvania
Other subject: Khmer, Central