OLAC Language Resource Catalog

Navigation Aids

OLAC Language Resource Catalog
Search for language resources
 

Main Content

2006 CoNLL Shared Task - Ten Languages
Title:
2006 CoNLL Shared Task - Ten Languages
ID:
ELRA-W0086
Link to the object:
Online:
Yes
Archive:
Date:
2019-04-29
Publisher:
ELRA (European Language Resources Association)
Description:
Written Corpora
2006 CoNLL Shared Task - Ten Languages consists of dependency treebanks in ten languages used as part of the CoNLL 2006 shared task on multi-lingual dependency parsing. The languages covered in this release are: Bulgarian, Danish, Dutch, German, Japanese, Portuguese, Slovene, Spanish, Swedish and Turkish. The Conference on Computational Natural Language Learning (CoNLL) is accompanied every year by a shared task intended to promote natural language processing applications and evaluate them in a standard setting. In 2006, the shared task was devoted to the parsing of syntactic dependencies using corpora from up to thirteen languages. The task aimed to define and extend the then-current state of the art in dependency parsing, a technology that complemented previous tasks by producing a different kind of syntactic description of input text. More information about CoNLL and the 2006 shared task are available respectively at:
and
The source data in the treebanks in this release consists principally of various texts (e.g., textbooks, news, literature) annotated in dependency format. In general, dependency grammar is based on the idea that the verb is the center of the clause structure and that other units in the sentence are connected to the verb as directed links or dependencies. This is a one-to-one correspondence: for every element in the sentence there is one node in the sentence structure that corresponds to that element. In constituency or phrase structure grammars, on the other hand, clauses are divided into noun phrases and verb phrases and in each sentence, one or more nodes may correspond to one element. All of the data sets in this release are dependency treebanks. The individual data sets are: BulTreeBank (Bulgarian) The Danish Dependency Treebank (Danish) The Alpino Treebank (Dutch) The TIGER Corpus (German) Treebank Tuba-J/S (Japanese) Floresta Sinta(c)tica (Portuguese) Slovene Dependency Treebank, SDT V0.1 (Slovene) Cast3LB (Spanish) Talbanken05 (Swedish) METU-Sabanci Turkish Treebank (Turkish) This corpus is distributed jointly with LDC. LDC Catalogue Reference is:
2006 CoNLL Shared Task - Ten Languages consists of dependency treebanks in ten languages used as part of the CoNLL 2006 shared task on multi-lingual dependency parsing. The languages covered in this release are: Bulgarian, Danish, Dutch, German, Japanese, Portuguese, Slovene, Spanish, Swedish and Turkish. The source data in the treebanks in this release consists principally of various texts (e.g., textbooks, news, literature) annotated in dependency format.
Content language:
Bulgarian
Danish
Dutch
German
Japanese
Portuguese
Slovenian
Spanish
Swedish
Turkish
Linguistic type:
Primary text
DCMI type:
Text
Other language:
Bulgarian
Danish
Dutch, Flemish
German
Japanese
Portuguese
Slovenian
Spanish, Castilian
Swedish
Turkish
Other rights:
Rights available for: Research Use
Complete OLAC record:
Link for this page:

Find Related Information:

Archive: ELRA Catalogue of Language Resources
Online: Yes
Linguistic type: Primary text
DCMI type: Text
Content language: Bulgarian
Content language: Danish
Content language: Dutch
Content language: German
Content language: Japanese
Date: 2000 and later
Date: 2010 - 2019
Publisher: ELRA (European Language Resources Association)
Title: 2006 CoNLL Shared Task - Ten Languages
Other language: Bulgarian
Other language: Danish
Other language: Dutch, Flemish
Other language: German
Other language: Japanese
Other rights: Rights available for: Research Use