OLAC Language Resource Catalog

Navigation Aids

OLAC Language Resource Catalog
Search for language resources
 

Main Content

N4 NATO Native and Non-Native Speech
Title:
N4 NATO Native and Non-Native Speech
ID:
LDC2006S13
https://catalog.ldc.upenn.edu/LDC2006S13
ISBN: 1-58563-344-5
ISLRN: 632-458-830-271-0
Online:
Yes
Archive:
Date:
2006
Publisher:
Linguistic Data Consortium
https://www.ldc.upenn.edu
Description:
*Introduction* This file contains documentation on the N4 NATO Native and Non-Native Speech Corpus, Linguistic Data Consortium (LDC) catalog number LDC2006S13 and ISBN 1-58563-344-5. The N4 NATO Native and Non-Native Speech corpus was developed by the NATO research group on Speech and Language Technology in order to provide a military-oriented database for multilingual and non-native speech processing studies. Speech data was recorded in the naval transmission training centers of four countries (Germany, The Netherlands, United Kingdom, and Canada). The material consists of native and non-native speakers speakers using NATO English procedure between ships and reading from a text, "The North Wind and the Sun," in both English and the speaker's native language. Speech technology is covering an increasing number of languages, and systems are becoming more robust with regard to speech variablity such as speaking style and accents. However, for real applications, especially in a multilingual and multinational context, further robustness to regional and even non-native accents is necessary. Among numerous corpora available for speech research few have specifically addressed this issue. The NATO Speech and Language Technology group decided to create a corpus geared towards the study of non-native accents. The group chose naval communications as the common task because it naturally includes a great deal of non-native speech and because there were training facilities where data could be collected in several countries. *Data* The database was collected in four countries (Germany, The Netherlands, United Kingdom, and Canada) during naval communication training sessions in 2000-2002. For each country, the main part of the recordings consists of a NATO Naval procedure in English where the typical sentence sounds like "This is alpha, whiskey, roger. I make two seven zero six hostile, two seven zero six. Out." In addition each speaker read a text, "The North Wind and the Sun," in English and his or her native language. The audio material was recorded on DAT and downsampled to 16kHz-16bit, and all the audio files have been manually transcribed and annotated with speakers identities using the tool, Transcriber. Navy procedure recordings and text readings have been stored in different files. The first digit in the filename indicates the type of speech Among speech segments, the duration of Navy procedure recordings range from 1.3h to 2.3h for a total of 7.5h. The duration of the native language text readings range from 1.5min to 22.9min for a total of around one hour. CA GE NL UK All Signal 5.30 3.20 5.00 6.30 19.80 Silence 3.00 0.56 2.00 4.70 10.26 Speech 2.30 2.64 3.00 1.60 9.54 Speech 2.30 2.64 3.00 1.60 9.54 Navy proc 2.00 1.90 2.30 1.30 7.50 Read text 0.30 0.74 0.70 0.30 2.04 Read text 0.30 0.74 0.70 0.30 2.04 Non-native 0.27 0.37 0.32 0.00 0.96 Native 0.03 0.37 0.38 0.30 1.08 The database contains the following information about each speaker: gender, age, weight, length, possible speaking or hearing disorders, education level, living area, accent, second language, the year English was learned(for non-native speakers). The speaker accents vary widely from country to country. The speaker's average age was 22.6 years. Nineteen women participated, accounting for 18% of the study participants. There were a total of 115 speakers. CA GE NL UK All #Speakers 22 51 31 11 115 #Women 5 0 9 5 19 Age 22-35 17-23 17-61 19-62 17-62 Age mean 28.3 20.1 21 27.5 22.6 *Samples* Please view this this audio sample and transcript sample.
Content language:
Dutch
English
German
Linguistic type:
Primary text
DCMI type:
Sound
Other format:
Sampling Rate: 16000
Sampling Format: pcm
Distribution: Web Download
Other language:
Dutch
English
German
Other rights:
Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
N4 NATO Native and Non-Native Speech Agreement: https://catalog.ldc.upenn.edu/license/n4-nato-native-and-non-native-speech.pdf
Rights holder: Portions © 2006 Trustees of the University of Pennsylvania
Complete OLAC record:
Link for this page:

Find Related Information:

Archive: The LDC Corpus Catalog
Online: Yes
Linguistic type: Primary text
DCMI type: Sound
Content language: Dutch
Content language: English
Content language: German
Date: 2000 - 2009
Date: 2000 and later
Contributor: Benarousse, Laurent
Contributor: Geoffrois, Edouard
Contributor: Grieco, John
Contributor: Series, Robert
Contributor: Steeneken, Herman
Publisher: Linguistic Data Consortium
Publisher: https://www.ldc.upenn.edu
Title: N4 NATO Native and Non-Native Speech
Other format: Distribution: Web Download
Other format: Sampling Format: pcm
Other format: Sampling Rate: 16000
Other language: Dutch
Other language: English
Other language: German
Other rights: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Other rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Other rights: N4 NATO Native and Non-Native Speech Agreement: https://catalog.ldc.upenn.edu/license/n4-nato-native-and-non-native-speech.pdf
Other rights: Rights holder: Portions © 2006 Trustees of the University of Pennsylvania