Orð og tunga - 01.06.2007, Page 83
Eiríkur Rögnvaldsson: Textasöfn og setningagerð: greining og leit 73
ings of the First Workshop on Treebanks and Linguistic Theories (TLT 2002), 20-21
September 2002. Sozopol, Bulgaria.
Osenova, Petya, og Kiril Simov. 2003. The Bulgarian HPSG Treebank: Specialization
of the Annotation Scheme. Joakim Nivre og Erhard Hinrichs (ritstj.): Proceedings
ofthe Second Workshop on Treebanks and Linguistic Theories (TLT 2003), bls. 129-140.
Váxjö University Press, Vaxjö.
Sigríður Sigurjónsdóttir og Joan Maling. 2001. Það var hrint mér á leiðinni í skólann:
Þolmynd eða ekki þolmynd? íslenskt mál 23:123-180.
Sigrún Helgadóttir. 2007. Mörkun íslensks texta. Orð og tunga 9 (þetta hefti).
Þórunn Blöndal. 2005. Lifandi mál. Inngangur að orðræðu- og samtalsgreiningu.
Rannsóknarstofnun Kennaraháskóla Islands, Reykjavík.
Lykilorð:
málheildir, dæmasetningar, málfræðileg mörkun
Keywords:
text corpora, example sentences, PoS tagging
Abstract
This paper discusses the use of text corpora in syntactic research, and how to search
for example sentences in corpora. During the past few decades, widely divergent
views have been expressed as to the value of corpora in syntactic argumentation. It
is argued in the paper that this disagreement stems from different views as to the
subject of linguistic research. The paper also discusses various problems that arise in
the interpretation of the information extracted from corpora - especially in drawing
conclusions from the silence of the texts on certain constructions. The main section
of the paper discusses the possibilities of searching for certain syntactic constructions
in different types of Icelandic corpora; raw untagged text, PoS tagged text, and text
where the major syntactic constituents and syntactic functions have been identified.
Data-driven PoS taggers have now been trained on Icelandic texts, and it is shown
that due to the inflectional character of Icelandic and the richness of the tagset, the re-
sulting PoS tagging is very effective in the search for various syntactic constructions.
Eiríkur Rögnvaldsson
Hdskóla íslands
Árnagarði við Suðurgötu
lS-101 Reykjavík
eirikur@hi.is