Málfríður - 15.05.1989, Page 5
through a relatively narrow set of
tests.
The essential point about validity,
though, is that A TEST IS VALID
IF IT TESTS WHAT IT IS IN-
TENDED TO TEST. It’s as simple,
and as difficult, as that. So, too, with
reliability: while we could look at
different aspects of reliability in de-
tail (e .g. inter-marker reliability and
test-retest reliability), the important
point is that A TEST IS RELIA-
BLEIFIT DOES WHATITISIN-
TENDED TO 10 TIMES OUT OF
10 (or at least 95 times out of 100).
The underlying message, then, is
that a number of the terms some-
times applied to tests to make them
appear ‘good’ tests (like ‘new’, ‘im-
proved’ soap powders) are seductive
but misleading if they imply the pos-
sibility of some new testing Utopia.
The buzz-words of recent years have
been ‘objective’ and, more recently
‘communicative’. In reality, there is
no such thing as an objective test,
only tests which can be objectively
scored. Equally, there is no such
thing as a ‘communicative’ test, only
tests which in their design and appli-
cation reflect to a greater or lesser
extent the attitudes underlying the
Communicative Approach.
The new buzz-word is going to be
‘computerized’, but computerized
tests will not necessarily represent a
refinement of our awareness of the
nature of language learning. For the
present at least, ‘computerized’ will
only mean ‘objectively’ scoreable
and quicker to mark because a com-
puter can mark them, i.e. in many
ways a move away from the more
complex view of language implied
by the communicative approach and
the use of, say, verbal descriptors to
support the testing of spoken lan-
guage performance.
The essence of what makes for a
good language test has not changed
very much in the last two decades,
though the ideal has rarely been
achieved. It comes from asking what
I would call the WH-questions of
testing:
1. WHO 2. WHY 3. WHAT
4. WHEN 5. HOW
If you ask and provide your own
answers to these questions every
time you design a test, then your
tests should function more effective-
ly, whatever label is applied to
them. The following guidelines may
be helpful:
1. WHO
1.1. What age group? Age will af-
fect test performance through fac-
tors such as concentration span, in-
tellectual capacity to cope with par-
ticular question types. Remember
you want to test somebody’s lan-
guage ability, not other skills.
1.2. Are you testing an individual,
or a small group, or a larger number
of testees? The number to be tested
will affect the type of test that will be
appropriate.
1.3. What level(s) are they? If you
don’t get the level right then your
test won’t give you the information
you need. Multi-level tests and
mono-level tests have different de-
sign constraints.
1.4. Is it a monolingual or a multi-
lingual group? If the group to be
tested share an L1 then you can justi-
fiably make use of contrastive fea-
tures between the L1 and the L2.
1.5. Is it general language profi-
ciency that you want to test, or ESP?
The test content and types should
obviously be different for a group of
say, air-traffic controllers from
those for a group of doctors or for a
class of 16-year-olds.
2. WHY
2.1. Why test at all? Who is going
to benefit from the testing - the lear-
ner, the teacher, the institution? If
there aren’t any good reasons for
testing, or if evaluation procedures
without testing would do the job just
as well, or better, then you shouldn’t
test at all.
2.2. There are good reasons for
testing: placement; diagnosis (for
remedial work, to inform course de-
sign, etc.); checking on learning; en-
try; progress and exit tests in rela-
tion to coursebooks; grading; selec-
tion; exclusion; certification;
prediction of suitable future targets.
2.3. Tests can be motivating, if
well designed and timed (but they
can also be very demotivating for
those who consistently do badly).
3. WHEN
3.1. There are a lot of important
time factors in testing. Tests can be
both retrospective and prospective
in their focus and when they are giv-
en can determine:
a) how accurate the results are
b) how useful the results are
3.2. WHEN factors for consider-
ation should include:
3.2.1. Time of year (seasonal fac-
tors can affect performance)
3.2.2. Day of the week
3.2.3. Time of day
3.2.4. Immediate/without warn-
ing v. With advance notice
3.2.5. On one occasion v. several
occasions (concentration, stamina,
time available etc.)
3.2.6. At the end of a teaching
‘unit’ v. within a ‘unit’.
4. WHAT
4.1. The question of WHAT TO
TEST implies our view of what lan-
guage is. A test of Latin could be just
atest of knowledge. Tests of English
as a live language should always in-
volve an appropriate sampling of
knowledge and skills.
4.2. ‘WHAT’ implies not only all
the different aspects of language
that can be tested, but also the de-
grees of skill shown in those aspects
of performance which are generally
considered to be essential compo-
nents of effective language use, viz.
Not just ACCURACY (getting
the ‘form’ of the language right) but
also APPROPRIACY (the right
language for each type of situation).
DELICACY (having a range of
lexical/intonational nuance availa-
ble)
FLEXIBILITY (being able to
cope with topic shifts etc.)
FLUENCY (facility of contin-
uous production, spoken and writ-
ten)
RANGE (a wide choice of struc-
tures and lexis available)
SPEED (speed of task perform-
ance is a factor which can be meas-
ured).
4.3. The ‘WHAT’ can only ever
be a sample of language perform-
ance, one that the tester hopes will
accurately replicate the larger cor-
pus of knowledge and range of skills
available to the testee. Real-world
time constraints may make it impos-
sible to test exhaustively or compre-
hensively, but at least over a period
an appropriate representative sam-
ple of language use shouldbe tested.
The ‘WHAT’ could be represent-
ed diagrammatically as below:
5