Orð og tunga - 01.06.2013, Qupperneq 49
Kristín Bjarnadóttir: Hvert á að sækja orðaforðann í orðabók?
39
dóttir og Hrafn Loftssson. 2012. The Tagged Icelandic Corpus (MIM).
Proceedings of "Language Technology for Normalization of Less-Resourced
Languages", SaLTMiL 8 - AfLaT 2012. Istanbul, Tyrklandi.
Sinclair, John (ritstj.). 1987. Collins Cobuild English Language Dictionary. Lon-
don and Glasgow: Collins.
Sverrir Hólmarsson, Christopher Sanders, John Tucker. 1989. íslensk-ensk
orðabók. Reykjavík: Iðunn.
Tímarit.is. Landsbókasafn íslands - Háskólabókasafn: http://timarit.is.
Quasthoff, Uwe, Sabine Fiedler, Erla Hallsteinsdóttir. 2012. Frequency Dic-
tionary. Icelandic. ISL. Leipzig: Leipziger Universitatsverlag.
Þórdís Ulfarsdóttir. 2013. ISLEX - Norræn margmála orðabók. Orð og tunga
15:41-71. (Þetta hefti.)
Lykilorð
orðabókarfræði, uppflettiorð, orðaforði, málheild, vélræn orðtaka
Keywords
lexicography, headwords, vocabulary, corpora, automatic excerption
Abstract
The topic of this paper is different methods in assembling a list of headwords for
Icelandic dictionaries. Until recently, the classic Icelandic dictionary (Islensk orðabók)
originally published in 1963, has been the primary source of data of the vocabulary
for Icelandic lexicographers, with the collections at the department of lexicography
of The Arni Magnússon Institute for Icelandic Studies as additional sources. With
the advent of electronic text collections, notably the tagged Icelandic corpus (Mörkuð
íslensk málheild, MIM), Icelandic lexicographers now have access to a huge new
source of data. The use of data on frequency from these new sources is of great value
to lexicographers in the choice of headwords, but the complete coverage of the words
and word forms in the texts nrade possible by new methods of word extraction also
complements older material by filling accidental gaps in material assembled by
older methods of dictionary excerption. Partly because of data scarcity problems in
a language with a very rich morphology, the conclusion in the paper is that word
frequency alone cannot be the base of an Icelandic dictionary at this point in time,
as the volume of texts needed for a good coverage of the vocabulary far exceeds
anything available now. One of the main reasons for this is a very productive system
of compounding, making it necessary to split compounds before attempting to use
frequency for the selection of vocabulary.
Kristín Bjarnadóttir
Stofnun Árna Magmíssonar í íslenskum fræðum / Háskóli íslands
kristinb@hi.is