Orð og tunga - 01.06.2012, Blaðsíða 12
2
Orð og tunga
human user of a dictionary brings to the dictionary a host of implicit
knowledge and cognitive skills that aid in dictionary use, in particular
a range of assumed world and cultural knowledge and the common
sense ability to deploy that knowledge appropriately in interpreting
the dictionary entry. A computer on the other hand comes to the elec-
tronic "dictionary" knowing nothing at all in advance and has only
the "sense" that is represented by the processing algorithms available
to it. To be effective, a computational lexical resource must therefore
represent the relevant information in a fully explicit and systematic
way, encoding information that to human users would seem obvious
and unnecessary.
This paper focuses on the meaning of words (lexical semantics)
and some important computational resources that have been devel-
oped to make lexical semantic information available to computers for
a variety of nlp tasks. It is intended as a survey article, describing
the properties of three major lexical semantic resources (WordNet,
DanNet and saldo), which provide a frame of reference for current
work on tw?o Icelandic projects, reported in this volume. Anna Björk
Nikulásdóttir reports on a project develöping semi-automatic means
for extracting information on lexical semantic relations from text cor-
pora (íslenskur merkingarbrunnur, cf Nikulásdóttir & Whelpton 2009,
2010a, 2010b); Jón Hilmar Jónsson reports on a project which is manu-
ally developing a network of lexical sense relations (Islenskt orðanet, cf
Jónsson 2008, 2009a, 2009b, 2009c; Úlfarsdóttir 2006).
Section 2 introduces one of the oldest and most influential lexical
semantic resources, the Princeton WordNet, and reviews some of the
central lexical semantic relations around which the resource is struc-
tured: synonymy, hyponymy, meronymy, antonymy, and troponymy.
Section 3 introduces DanNet, a lexical semantic resource for Danish,
conforming to the international standards of wordnet development;
a number of challenges faced by DanNet are reviewed, in particular
the challenge of converting traditional dictionary information into a
computer-tractable form and the challenge of addressing deficien-
cies in the relation set of the original Princeton WordNet. Section 4
introduces saldo, a morphological and lexical semantic database for
Swedish, organised on radically different lines to the wordnets, as
it attempts to model the degree of centrality of lexicalised concepts
in Swedish rather than encoding specific lexical semantic relations
between them. Section 5 concludes this survey and points on to the
papers introducing the two Icelandic resources.