WHAT AND HOW TO TEST
PAPER
                        By:
       FIRDAUS NUR HABIBA (201910560211002)
           IKA YULIANA (201910560211011)
   ENGLISH LANGUAGE EDUCATION DEPARTMENT
FACULTY OF MAGISTER ENGLISH LANGAUGE EDUCATION
     UNIVERSITY OF MUHAMMADIYAH MALANG
                       2020
                         1
   A. Purposes for language learning and language testing
       Given the variety of foreign language classrooms, the diversity of student
reasons for enrolling in language classes, the choices language instructors make in
terms of textbooks and other instructional materials they wish to use in their
teaching, and the relatively new tool of the Internet as an instructional resource, it
goes without saying that the purposes of language testing are numerous. Sometimes,
language teachers choose to test students via periodic quizzes and tests of
achievement. At other times, instructors assess students’ language proficiency
perhaps at the end of several years of language study. At other times, language
teachers use tests for placement and diagnostic reasons and other purposes. Shohamy
(2001) wrote a wonderful book about the power that tests can exert in the lives of
students. She offers a lot of case studies in her book about students who have been
impacted by the results of test scores. She reports that no matter what the teacher’s
purpose for the language test, students are sometimes devastated by the results of
tests. And this is particularly true of so called high stakes tests such as the Test of
English as a Foreign Language (TOEFL) that is required for international students
who want to enter English-speaking colleges.
       So, tests can have a wash back effect, which means that they may result in
instructional programs or teaching practices changing to reflect the test contents
because language teachers want their students to do well on high stakes tests for
many different reasons. In some respects, standardized test can be expected to have
an indirect effect on what language teachers teach and sometimes even how they
teach the foreign language. As an experienced language educator, the author of this
essay accepts the inevitability of the wash back effect of major tests, even those
given at the end of the term by the instructor because we are sometimes obligated to
«teach to the test. However, once this inevitability is accepted, foreign language
teachers often advocate for an even more important outcome than passing the test.
They often teach their students to become autonomous language learners, meaning
we want students to become independent learners so they continue to learn the
language long after they have completed formal language study.
                                          2
   B. Testing versus Assessment
   C. Some Ways of Integrating Teaching and Testing in Foreign Language
       Classrooms
       1. Communication
       A common reason why students worldwide study foreign and second
languages, that is, to learn to communicate. For example, many students these days
want to communicate with their peers via e-mail and text messages. However, they
may also need to learn to communicate for business or career purposes using a more
formal variety of the foreign language.
       2. Communities
       Real languages are used in real communities for people to be able to
communicate with each other, even though the Internet and other technology has
made global communication much more facile that it has ever been in history. For
example, families communicate regularly with each other verbally and non-verbally
on a regular basis. When exchange students have the opportunity to participate in
family stay experiences, they seem to pick up so much more than the rules of
pronunciation of a language or its vocabulary; they seem to develop proficiency in a
naturalistic way.
                                          3
       3. Cultures
       A concept that is very complex and has many different meanings, but in the
context of the Five Cs, it refers mainly to the life styles, mores, beliefs, and habits of
people who share not only a language (e.g., Spanish) but who also share deeper
values. In languages like Spanish with more than twenty different countries where it
is spoken and many different cultures within each country and English with its many
variations in countries as different as the Australia, England, New Zealand, Nigeria,
and the U.S.A, it is clear why this is a complex concept. Any yet, language
instructors worldwide share the view that culture is intricately related to language
and therefore cannot be avoided when a foreign language is taught.
       4. Comparisons
       Related to culture, this concept means that language learners, almost without
exception, tend to make comparisons between their L1 and their L2 during the
language program and even after completing language study; it is, therefore, perhaps
useful for language instructors to help students make appropriate connection and
avoid unhelpful ones in their language study. For example, many novice language
learners make comments like, but the way they in native speaker language pronounce
certain sounds is strange, or the way their grammar works is weird, etc. Most
language instructors typically try to convince their students that making judgmental
comparisons is sometimes not helpful when trying to develop proficiency in a
language.
       5. Connections
       A concept that is related to learning theory in which it is often helpful for
instructors to help students make connections with their prior knowledge, life
experiences, and how they process information when they are trying to learn
something new like a foreign language. For example, some language instructors help
                                            4
student to make connections between two aspects of the foreign language (e.g., the
complexities of spelling in English with such words a through, though, and thought).
   D. Constructing Test
       1. Test Items
       A test item is a specific task test takers are asked to perform. Test items can
assess one or more points or objectives, and the actual item itself may take on a
different constellation depending on the context. For example, an item may test one
point (understanding of a given vocabulary word) or several points (the ability to
obtain facts from a passage and then make inferences based on the facts). Likewise, a
given objective may be tested by a series of items. For example, there could be five
items all testing one grammatical point (e.g., tag questions). Items of a similar kind
may also be grouped together to form subtests within a given test.
       2. Classifying Items
       Discrete – A completely discrete-point item would test simply one point or
objective such as testing for the meaning of a word in isolation. For example:
           Choose the correct meaning of the word paralysis.
           (A) inability to move
           (B) state of unconscious
           (C) state of shock
           (D) being in pain
       Integrative – An integrative item would test more than one point or objective
at a time. (e.g., comprehension of words, and ability to use them correctly in
context). For example: Demonstrate your comprehension of the following words by
using them together in a written paragraph: “paralysis,” “accident,” and “skiing.”
Sometimes an integrative item is really more a procedure than an item, as in the case
of a free composition, which could test a number of objectives; for example, use of
appropriate vocabulary, use of sentence level discourse, organization, statement of
thesis and supporting evidence. For example:
                                          5
       Write a one-page essay describing three sports and the relative likelihood of
being injured while playing them competitively. Objective – A multiple-choice item
for example, is objective in that there is only one right answer. Subjective – A free
composition may be more subjective in nature if the scorer is not looking for any one
right answer, but rather for a series of factors (creativity, style, cohesion and
coherence, grammar, and mechanics).
       3. The Skill Tested
       The language skills that we test include the more receptive skills on a
continuum – listening and reading, and the more productive skills – speaking and
writing. There are, of course, other language skills that cross-cut these four skills,
such as vocabulary. Assessing vocabulary will most likely vary to a certain extent
across the four skills, with assessment of vocabulary in listening and reading –
perhaps covering a broader range than assessment of vocabulary in speaking and
writing. We can also assess nonverbal skills, such as gesturing, and this can be both
receptive (interpreting someone else’s gestures) and productive (making one’s own
gestures).
       4. The Intellectual Operation Required
       Items may require test takers to employ different levels of intellectual
operation in order to produce a response (Valette, 1969, after Bloom et al., 1956).
The following levels of intellectual operation have been identified: knowledge
(bringing to mind the appropriate material); comprehension (understanding the basic
meaning of the material); application (applying the knowledge of the elements of
language and comprehension to how they interrelate in the production of a correct
oral or written message); analysis (breaking down a message into its constituent parts
in order to make explicit the relationships between ideas, including tasks like
recognizing the connotative meanings of words and correctly processing a dictation,
and making inferences); synthesis (arranging parts so as to produce a pattern not
clearly there before, such as in effectively organizing ideas in a written composition);
                                           6
and evaluation (making quantitative and qualitative judgments about material). It has
been popularly held that these levels demand increasingly greater cognitive control
as one moves from knowledge to evaluation – that, for example, effective operation
at more advanced levels, such as synthesis and evaluation, would call for more
advanced control of the second language. Yet this has not necessarily been borne out
by research (see Alderson & Lukmani, 1989). The truth is that what makes items
difficult, sometimes defies the intuitions of the test constructors.
       5. Grammatical competence
       Major grammatical errors might be considered those that either interfere with
intelligibility or stigmatize the speaker. Minor errors would be those that do not get
in the way of the listener's comprehension nor would they annoy the listener to any
extent. Thus, getting the tense wrong in the above example, "We have had a great
time at your house last night" could be viewed as a minor error, whereas in another
case, producing "I don't have what to say" ("I really have no excuse" by translating
directly from the appropriate Hebrew language) could be considered a major error
since it is not only ungrammatical but also could stigmatize the speaker as rude and
unconcerned, rather than apologetic.
       Rational for Tests:
       Measures of student performance (testing) may have as many as five
purposes:
           Student Placement,
           Diagnosis of Difficulties,
           Checking Student Progress,
           Reports to Student and Superiors,
           Evaluation of Instruction.
       Unfortunately, the most common perception is that tests are designed to
statistically rank all students according to a sampling of their knowledge of a subject
and to report that ranking to superiors or anyone else interested in using that
                                            7
information to adversely influence the student's feeling of self-worth. It is even more
unfortunate that the perception matches reality in the majority of testing situations.
Consequently, tests are highly stressful anxiety producing events for most persons.
       6. True/ False Questions
       True/false questions should be written without ambiguity. That is, the
statement of the question should be clear and the decision whether the statement is
true or false should not depend on an obscure interpretation of the statement. A
true/false question may easily be used, and most commonly is used, to determine if
the student recalls facts. However, a true/false question may also be used to
determine if the learner has mastered the learning objective well enough to correctly
analyse a statement. It is important to be aware that only two choices are available to
the student and therefore the nature of the question gives the student a 50% chance of
being correct. A single True/False question therefore is helpful only if the student
answers the question incorrectly and the incorrect response indicates a specific
misunderstanding of the learning objective.
       A collection of true/false questions, about a single learning objective, all
answered correctly by a student is a much stronger indication of mastery. It is
therefore important that the instructional developer construct a "test bank" containing
a large number of true/false questions. It is also important to include numerous
true/false questions on any test which utilizes true/false questions. Ideally a true/false
question should be constructed so that an incorrect response indicates something
about the student's misunderstanding of the learning objective. This may be a
difficult task, especially when constructing a true statement. The instructional
developer should try to accomplish the ideal, but should recognize that in some
instances he/she will not reach that goal.
       7. Multiple Choice Questions
       Multiple choice questions should be written without ambiguity. That is, the
statement of the question stem should be clear and should leave no doubt about how
to select choices. Additionally, the choices should be written without ambiguity and
                                             8
should contain all information required to make a decision whether or not to choose
it. The decision whether to select or not select a choice should not depend on an
obscure interpretation of either the stem or the choice. A multiple choice question
may easily be used to determine if the student recalls facts. However, a multiple
choice question may also be used to determine if the student has mastered the
learning objective well enough to correctly analyse a statement.
       Multiple choice questions should therefore contain any number of choices
with one or more valid choices. The student is of course required to select all valid
choices and failure to select any one of the valid choices will provide information
about the student's misunderstanding of the learning objective in the same way that
selection of an invalid choice reveals the nature of his/her misunderstanding. The
nature of the choices provided in a multiple choice question may be of two types:
those which require merely recall of facts and those which require additionally
activity such as synthesis, analysis, computation, comparison, or diagramming. The
instructional developer who is seriously concerned with the student's success will use
both types extensively.
       8. Fill-in-the-Blank Questions
       The temptation, when constructing fill in the blank questions, is to construct
traps for the student. The instructional developer should avoid this problem. Ensure
that there is only one acceptable word for the student to provide and that the word (or
words) is significant. Avoid asking the student to supply "minor" words. Avoid fill in
the blank question with so many blanks that the student is unable to determine what
is to be completed.
   E. Test Construction
       1. Closed-Answer or “Objective” Tests
       Although by definition no test can be truly “objective” (existing as an object
of fact, independent of the mind), this handbook refers to tests made up of multiple
                                          9
choice, matching, fill-in, true/false, or fill-in-the-blank items as objective tests.
Objective tests have the advantages of allowing an instructor to assess a large and
potentially representative sample of course material and allow for reliable and
efficient scoring. The disadvantages of objective tests include a tendency to
emphasize only “recognition” skills, the ease with which correct answers can be
guessed on many item types, and the inability to measure students’ organization and
synthesis of material (Adapted with permission from Yonge, 1977).
       2. Essay Tests
       Conventional      wisdom   accurately   portrays   short-answer   and   essay
examinations as the easiest to write and the most difficult to grade, particularly if
they are graded well. You should give students an exam question for each crucial
concept that they must understand. If you want students to study in both depth and
breadth, don't give them a choice among topics. This allows them to choose not to
answer questions about those things they didn’t study. Instructors generally expect a
great deal from students, but remember that their mastery of a subject depends as
much on prior preparation and experience as it does on diligence and intelligence;
even at the end of the semester some students will be struggling to understand the
material. Design your questions so that all students can answer at their own levels.
The following are some suggestions that may enhance the quality of the essay tests
that you produce (Adapted with permission from Ronkowski, 1986):
       1. Have in mind the processes that you want measured (e.g., analysis,
           synthesis).
       2. Start questions with words such as “compare,” “contrast,” “explain why.”
           Don’t use “what,” “when,” or “list.” (These latter types are better
           measured with objective-type items).
       3. Write items that define the parameters of expected answers as clearly as
           possible.
       4. Make sure that the essay question is specific enough to invite the level of
           detail you expect in the answer. A question such as “Discuss the causes of
                                         10
   the American Civil War,” might get a wide range of answers, and
   therefore be impossible to grade reliably. A more controlled question
   would be, “Explain how the differing economic systems of the North and
   South contributed to the conflicts that led to the Civil War.
5. Don’t have too many questions for the time available.
                                  11