OLAC Language Resource Catalog

Navigation Aids

OLAC Language Resource Catalog
Search for language resources
 

Main Content

Deltacorpus
Title:
Deltacorpus
Link to the object:
Online:
Yes
Archive:
Contributor:
Mareček, David (author)
Yu, Zhiwei (author)
Zeman, Daniel (author)
Žabokrtský, Zdeněk (author)
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Description:
Texts in 107 languages from the W2C corpus (
, first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia).
Content language:
Belarusian
Bosnian
Bulgarian
Czech
Serbo-Croatian
Croatian
Upper Sorbian
Macedonian
Polish
Russian
Slovak
Slovenian
Serbian
Ukrainian
Latvian
Lithuanian
Afrikaans
Danish
German
English
Faroese
Western Frisian
Swiss German
Icelandic
Limburgan
Luxembourgish
Low German
Dutch
Norwegian Nynorsk
Norwegian
Scots
Swedish
Yiddish
Aragonese
Asturian
Catalan
French
Galician
Haitian
Italian
Latin
Lombard
Neapolitan
Piemontese
Portuguese
Romanian
Spanish
Venetian
Walloon
Breton
Welsh
Scottish Gaelic
Irish
Modern Greek (1453-)
Armenian
Albanian
Dimli (individual language)
Persian
Gilaki
Kurdish
Tajik
Bengali
Bishnupriya
Gujarati
Fiji Hindi
Hindi
Marathi
Nepali (macrolanguage)
Urdu
Amharic
Arabic
Egyptian Arabic
Hebrew
Estonian
Finnish
Hungarian
Basque
Georgian
Chuvash
Azerbaijani
Turkish
Uzbek
Kazakh
Tatar
Yakut
Korean
Mongolian
Telugu
Kannada
Malayalam
Tamil
Newari
Vietnamese
Indonesian
Javanese
Malagasy
Maori
Malay (macrolanguage)
Pampanga
Sundanese
Tagalog
Waray (Philippines)
Swahili (macrolanguage)
Esperanto
Ido
Interlingua (International Auxiliary Language Association)
Volapük
Linguistic type:
Primary text
DCMI type:
Text
Other date:
2016-03-22T16:44:19Z
Other rights:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
http://creativecommons.org/licenses/by-sa/4.0/
Other subject:
part of speech
tagging
semi-supervised
cross-language
Other type:
corpus
Complete OLAC record:
Link for this page:

Find Related Information:

Archive: LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Online: Yes
Linguistic type: Primary text
DCMI type: Text
Content language: Afrikaans
Content language: Albanian
Content language: Amharic
Content language: Arabic
Content language: Aragonese
Contributor: Mareček, David
Contributor: Yu, Zhiwei
Contributor: Zeman, Daniel
Contributor: Žabokrtský, Zdeněk
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Title: Deltacorpus
Other date: 2016-03-22T16:44:19Z
Other rights: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Other rights: http://creativecommons.org/licenses/by-sa/4.0/
Other subject: cross-language
Other subject: part of speech
Other subject: semi-supervised
Other subject: tagging
Other type: corpus