Gripla - 2020, Page 34
33
we are not able to cull 800 MFWs at 50%, since the documents do not
have 800 words total in common with at least 50% of them. This results in
24 different tests and the average cosine distances of each are calculated to
arrive at a sort of consensus between many parameterization scenarios.
Having done this, we obtain the results in Figure 3. This time around
the distances to A-parallel for A- and C-divergent are virtually identical
(1.382 in dark gray and 1.380 in light gray respectively), but we can still
clearly see that C-divergent is closer to C-parallel than A-divergent is (on
the right-hand side). Having brought the word counts of our documents
in line with A-divergent, it would appear that A-divergent has more op-
portunity to compete with the similarity scores over other documents.
Nevertheless, our observation remains that C appears to be the most
internally-coherent redaction.
But is this result explained by style, or by something else? For in-
stance, one of the words which the above tests always take into account is
“Guðmundur.” On the whole, the appearance of certain characters or, gen-
erally, proper nouns, in one document versus the other does not have much
to say about “style.” It has more to do with thematic content and narrative.
To be safe, for our third and final test, we remove all proper nouns.65
Another class of words have more to do with the circumstances of man-
uscript transmission rather than style. We are particularly concerned about
the highly-frequent discourse verbs which may appear either in present or
preterite: svaraði instead of svarar or sagði instead of segir. While the usage
of one over the other may very well be stylistic, these words are simply
too volatile in manuscript transmission to be considered here. Moreover,
these words are often abbreviated such that it is impossible to tell which
word form is being used. Thus, these finite verbs were collapsed into their
present forms. Other word forms to consider would be other frequently
occurring words such as en and og which display volatility in manuscript
transmission. The frequencies of these words were inspected individually
and it was concluded that there was no need to remove them. Though 561
has a tendency to use og more than the C manuscripts, its only result is
bringing A-parallel and A-divergent closer together, and in the results that
A STYLOMETRIC ANALYSIS OF LJóSVETNINGA SAGA
65 That said, the substitution of a proper name for a pronoun may indeed be a stylistic tend-
ency which we want to address. But this would be caught by an increase in frequency of
those pronouns, meaning this is still accounted for.