Midterm Lessons
PROFED 6 ( ASSESSMENT IN LEARNING)
Imelda P. Oruga
Sept-Oct 2024
What are tests for?
Inform learners and teachers of the strengths and
weaknesses of the process
Motivate learners to review or consolidate specific material
Guide the planning/development of the ongoing teaching
process
Create a sense of accomplishment
Determine if the objectives
have been achieved
Encourage improvement 2
Organized by CCIT and COED in cooperation with HRMD and MIS
Guidelines for Test Construction
Planning the Test
Setting the test Creating a table of
objectives specification
What are
Why How to the levels Why How to What are
define? set? of thinking create? create? the types?
skills?
3
Organized by CCIT and COED in cooperation with HRMD and MIS
Tool used by teacher to
Test blueprint
design a test
a two –way chart which
a chart that provides
describes the topics to be
graphic representations of
covered in a test and the
the content of a course or
curriculum elements and number of items or points
the educational objectives which will be associated
with each topic
Ensures that the instructional objectives and what the test captures match
Ensure that the test developer will not overlook details that are
considered essential to a good test
Makes developing a test easier and more efficient
Ensure that the test will sample all important content areas and
processes
Is useful in planning and organizing
Offers an opportunity for teachers and students to clarify achievement
expectations
5
Organized by CCIT and COED in cooperation with HRMD and MIS
1. The TOS requires a thorough knowledge of Bloom’s
Revised Taxonomy.
2.The TOS requires, as reference, the budgeted lessons
(allocation of time per topic in every grading period with
respect to the desired total number of days / time to be
spent for the grading period.)
3.The TOS requires some simple mathematical computations
that will result to proportional allocation of test items per
topic.
4.The TOS requires that previous experiences are recalled and to some
extent, it requires likewise the imagination of the TOS constructor to
concretize the actual teaching-learning process based on previous
encounters in the classroom in order to determine more or less the
domain/s where he would based from his questions
5.The TOS constructors shall likewise prepare the budgeted lesson to
accompany the TOS from first grading to fourth grading period.
FORMATS OF
TEST TABLE OF
SPECIFICATION
One-way Two-way Three-
TOS TOS way TOS
8
Organized by CCIT and COED in cooperation with HRMD and MIS
ONE-WAY TOS
TOPIC TEST OBJECTIVE NO. OF FORMAT AND No. and
HOURS PLACEMENT OF Percent of
SPENT ITEMS Items
Theories and Recognize the 0.5 Multiple Choice 5 (10.0%)
concepts important concepts in Items #1-5
personality theories
Psychoanalytic Identify the different 1.5 Multiple Choice 15(30%)
theories theories of personality Items # 6-20
under Psychoanalytic
model
etc Organized by CCIT and COED in cooperation with HRMD and MIS
TOTAL 5 50(100%)
TWO -WAY TOS
TIME No. & of Items KD LEVEL OF COGNITIVE BEHAVIOR,ITEM FORMAT
CONTENT SPENT Percent ,NO. AND PLACEMENT OF ITEMS
R U AP AN E C
Theories 0.5 5(10%) F
and concepts C
Psycho-analytic 1.5 15(30%) F
Theories hrs
C
SCORING
1 point per item 2 points per item 3 points per item
OVERALL
TOTAL 50(100%) 20 20 20
* Legend: KD= Knowledge Dimensions ( factual, Conceptual, procedural, metacognitive)
Organized by CCIT and COED in cooperation with HRMD and MIS
Three-Way TOS
Organized by CCIT and COED in cooperation with HRMD and MIS
Table of Specification
COGNITIVE LEVELS
COGNITIVE LEVELS
Number of
Number of % of
% of Number of
Number of Item
Item Remembering
Remembering Understanding
Understanding Applying Analyzing
Applying Analyzing Evaluating
Evaluating Creating
Creating
LESSON
LESSON COMPETENCIES
COMPETENCIES LEVEL OF
OF DIFFICULTY
DIFFICULTY
Days
Days Item
Item Item(s)
Item(s) Placement
Placement LEVEL
Easy (30%)
Easy (30%) Average (50%)
Average (50%) Difficult (20%)
Difficult (20%)
SEQUENCE 1.
SEQUENCE 1. generates
generates patterns.***
patterns.***
2. illustrates
2. illustrates an
an arithmetic
arithmetic sequence
sequence
3. determines
3. determines arithmetic
arithmetic means
means andand
nth term
Arithmetic nth
Arithmetic term of
of an
an arithmetic
arithmetic
sequence.***
Sequence sequence.***
Sequence
4. finds
4. finds the
the sum
sum of
of the
the terms
terms of
of aa
given arithmetic
given arithmetic sequence.***
sequence.***
5. illustrates
5. illustrates aa geometric
geometric sequence.
sequence.
6. differentiates a geometric
6. differentiates a geometric
sequence from
sequence from anan arithmetic
arithmetic
sequence.
sequence.
7. differentiates
7. differentiates aa finite
finite geometric
geometric
sequence from
sequence from anan infinite
infinite geometric
geometric
Geometric
Geometric sequence.
sequence.
Sequence
Sequence
8. determines
8. determines geometric
geometric means
means and
and
nth term
nth term of
of aa geometric
geometric sequence.***
sequence.***
9. finds
9. finds the
the sum
sum ofof the
the terms
terms of
of aa
given finite
given finite or
or infinite
infinite geometric
geometric
sequence.***
sequence.***
10. illustrates
illustrates other
other types
types of
of
HARMONIC 10.
HARMONIC sequences (e.g.,
(e.g., harmonic,
harmonic, Fibonacci).
Fibonacci).
and FIBONACCI sequences
and FIBONACCI
11. solves problems involving
SEQUENCE 11. solves problems involving
SEQUENCE
sequences.
sequences.
12. performs
12. performs division
division of
of polynomials
polynomials
using long
using long division
division and
and synthetic
synthetic
division.
division.
13. proves
13. proves the
the Remainder
Remainder Theorem
Theorem
and the
and the Factor
Factor Theorem.
Theorem.
14. factors
14. factors polynomials.
polynomials.
POLYNOMIAL 15. illustrates polynomial equations.
POLYNOMIAL 15. illustrates polynomial equations.
16. proves
16. proves Rational
Rational Root
Root Theorem.
Theorem.
17. solves
17. solves polynomial
polynomial equations.
equations.
18. solves
18. solves problems
problems involving
involving
polynomials and
polynomials and polynomial
polynomial
equations.
equations.
TOTAL
TOTAL
Tips in Preparing the Table
of Specifications (TOS)
Don’t make it overly detailed.
It's best to identify major ideas and skills rather than
specific details.
Use a cognitive taxonomy that is most appropriate to
your discipline, including non-specific skills like
communication skills or graphic skills or
computational skills if such are important to your
evaluation of the answer.
MATCH the question level appropriate to the level of
thinking skills (Level of difficulty)
What is Bloom’s Taxonomy?
• Bloom’s Taxonomy is a classification of
thinking organized by levels of complexity. It gives
teachers and students an opportunity to learn and
practice a range of thinking and provides a simple
structure for many different kinds of questions.
What is REVISED BLOOM’S TAXONOMY?
The Revised Bloom’s Taxonomy provides the
measurement tool for thinking. The changes in RBT
occur in three broad categories.
• Terminologies
• Structure
• Emphasis
A.Visual Comparison of Two Taxonomies
(Terminology Changes)
Evaluation Creating
Synthesis Evaluating
Analysis Analyzing
Application Applying
Comprehension Understandin
Knowledge g
Remembering
1956 2001
(Based on Pohl, 2000, Learning to Think, Thinking to Learn, p.8)
THE LEARNER IS ABLE TO RECALL, RESTATE AND REMEMBER LEARNED
INFORMATION.
- RE COGNIZ ING
- LISTING
- D E SCRIBING
- IDENTIFYING
- RETRIEVING
- NAMING
- LOCATING
- FINDING
CAN YOU RECALL INFORMATION?
Sample Questions for Remembering
• What is ?
• Where is ?
• How did it happen ?
• Why did ?
• When did ?
• How would you show ?
• Who were the main ?
• Which one ?
• How is ?
THE LEARNER GRASPS THE MEANING OF INFORMATION BY INTERPRETING AND
TRANSLATING WHAT HAS BEEN LEARNED.
- INTERPRETING
- EXEMPLIFYING
- SUMMARIZING
- INFERRING
- PARAPHRASING
- C LAS S IFY IN G
- COMPARING
- EXPLAINING
CAN YOU EXPLAIN IDEAS OR CONCEPTS?
Sample Questions for Understanding
• State in your own words…
• Which are facts? Opinions?
• What does this means…?
• Is this the same as …?
• Giving an example
• Select the best definition
Questions with what, where, why
and how questions answers could
be taken between the lines of the
text through organizing,
comparing, translating,
interpreting, extrapolating,
classifying, summarizing and
stating main ideas fall under
understanding.
•Condense this paragraph
•What would happen if … ?
•What part doesn’t fit?
•How would compare? Contrast?
•What is the main idea of … ?
•How would summarized … ?
THE LEARNER MAKES USE OF INFORMATION IN A CONTEXT DIFFERENT FROM
THE ONE IN WHICH IT WAS LEARNED.
- IMPLEMENTING
- CARRYING OUT
- USING
- EXECUTING
CAN YOU USE THE INFORMATION IN ANOTHER FAMILIAR SITUATION?
Sample Questions for Applying
• How would you organize to show ?
• How would you show your understanding of ?
• What facts would you select to show what ?
• What elements would you change ?
• What other way would you plan to ?
• What questions would you ask in an interview
with ?
• How would you apply what you learned to
develop ?
• How would you solve using what you
have learned?
THE LEARNER BREAKS LEARNED INFORMATION INTO ITS PARTS TO BEST
UNDERSTAND THAT INFORMATION.
- COMPARING
- ORGANIZING
- DECONSTRUCTING
- ATTRIBUTING OUTLINING
- FINDING
- STRUCTURING
- INTEGRATING
CAN YOU BREAK INFORMATION INTO PARTS TO EXPLORE UNDERSTANDINGS
AND RELATIONSHIPS?
Sample Questions for Analyzing
• Which statement is relevant?
• What is the conclusion?
• What does the author believe? Assume?
• Make a distinction between
• What ideas justify the conclusion?
• Which is the least essential statement?
• What literacy form is used?
THE LEARNER MAKES DECISIONS BASED ON IN-DEPTH REFLECTION, CRITICISM
AND ASSESSMENT.
- CHECKING
- HYPOTHESIZING
- CRITIQUING
- EXPERIMENTING
- JUDGING
- TESTING
- DETECTING
- MONITORING
CAN YOU JUSTIFY A DECISION OR COURSE OF ACTION?
Sample Questions for Evaluating
• What fallacies, consistencies,
inconsistencies appear ?
• Which is more important ?
• Do you agree ?
• What information would you use
?
• Do you agree with the ?
• How would you evaluate ?
THE LEARNER CREATES NEW IDEAS AND INFORMATION USING WHAT HAS BEEN
PREVIOUSLY LEARNED.
- DESIGNING
- CONSTRUCTING
- PLANNING
- PRODUCING
- INVENTING
- DEVISING
- MAKING
CAN YOU GENERATE NEW PRODUCTS, IDEAS, OR WAYS OF VIEWING THINGS?
Sample Questions for Creating
• Can you design a ?
• What possible solution to ?
• How many ways can you ?
• Can you create a proposal which
would ?
B. STRUCTURAL CHANGES
Bloom’s original cognitive taxonomy was a
one-dimensional form consisting of Factual,
Conceptual and Procedural – but these were
never fully understood for use by the teachers
because most of what educators were given in
training consisted of a simple chart with the
listing of levels and related accompanying
verbs.
The Revised Bloom’s Taxonomy takes the form of
Two-dimensional table. The Knowledge Dimension or
the kind of knowledge to be learned and second is the
Cognitive Process Dimension or the process used to
learn.
The The Cognitive Process Dimensions
Knowledg
e Remembering Understanding Applying Analyzing Evaluating Creating
Dimensio
ns
Factual
Conceptual
Procedural
Metacognitive
Conceptual Knowledge
- is knowledge of classification,
principles, generalizations,
theories, models or structure
pertinent to a particular
disciplinary area.
Factual Knowledge
- Refers to the essential facts,
terminology, details or elements
student must know or be
familiar with in order to solve a
problem in it.
Procedural Knowledge
-Refers to information or knowledge
that helps students to do
something specific to a discipline
subject, area of study.
It also refers to methods of inquiry,
very specific or finite skills,
algorithms, techniques and
particulars.
Meta-cognitive Knowledge
- is a strategic or reflective
knowledge about how to go
solving problems, cognitive tasks
to include contextual and
conditional knowledge and
knowledge of self.
C. CHANGE IN EMPHASIS
Emphasis is the third and final
category of changes. It is placed upon
its use as a more “authentic tool for
curriculum planning, instructional
delivery and assessment”.
•More authentic tool for curriculum planning,
instructional delivery and assessment
•Aimed at a broader audience
•Easily applied to all levels of schooling
•The revision emphasizes explanation and
description of subcategories.
BLOOM’S REVISED TAXONOMY Suggested Percentage
Allocation
CREATING
Generating new ideas, products, or ways of viewing things
10%
Designing, constructing, planning, producing, inventing
Higher-order EVALUATING
Justifying a decision or course of action 30%
Thinking 10%
Checking, hypothesizing, critiquing, experimenting, judging
ANALYZING
Breaking information into parts to explore understandings and
relationships 10%
Comparing, organizing, deconstructing, interrogating, finding.
APPLYING
Using information in another familiar situation 20%
Implementing, carrying out, using, executing
UNDERSTANDING
Explaining ideas or concepts 20%
Interpreting, summarizing, paraphrasing, classifying, explaining
REMEMBERING 30%
Recalling information
Recognizing, listing, describing, retrieving, naming, finding.
How to Construct Table of Specification
1.Determine the desired
number of test items.
How to Construct Table of Specification
2. List the topics with the
corresponding
allocation of time
•The reference is the
budgeted lesson.
TABLE OF SPECIFICATIONS
Subject Grade Grading period School Year
DOMAINS Total Number
Time
Spent/
Of Test items
Topic
Frequenc Remembering Understanding Applying Analyzing Evaluating Creating Actual Adjusted
y
1. 3
2. 4
3. 1
4. 6
5. 8
6. 5
7. 8
8. 2
9. 4
10. 4
45
TOTAL
50
How to Construct Table of Specification
3. Determine the total number of items per topic by using the formula:
Time Spent / Frequency per topic divided by the total number of
frequency in the grading period times total number of items.
Time Spent / Frequency per Topic Total Number of items
Total Frequency in the grading period
Example:
3
45
50 = 3.33
TABLE OF SPECIFICATIONS
Subject Grade Grading period School Year
DOMAINS Total Number
Time
Spent/
Of Test items
Topic
Frequenc Remembering Understanding Applying Analyzing Evaluating Creating Actual Adjusted
y
1. 3 3.33
2. 4 4.44
3. 1 1.11
4. 6 6.66
5. 8 8.88
6. 5 5.55
7. 8 8.88
8. 2 2.22
9. 4 4.44
10. 4 4.44
45 49.95
TOTAL
50
How to Construct Table of Specification
4. Round off the value to
become whole numbers.
TABLE OF SPECIFICATIONS
Subject Grade Grading period School Year
DOMAINS Total Number
Time
Spent/
Of Test items
Topic
Frequenc Remembering Understanding Applying Analyzing Evaluating Creating Actual Adjusted
y
1. 3 3.33 3
2. 4 4.44 4
3. 1 1.11 1
4. 6 6.66 7
5. 8 8.88 9
6. 5 5.55 6
7. 8 8.88 9
8. 2 2.22 2
9. 4 4.44 4
10. 4 4.44 4
45 49.95 49
TOTAL
50
How to Construct Table of Specification
5. Adjust or Balance by either
adding or subtracting (any of the
topic totals) so that the sum will
amount to the desired number of
test items.
TABLE OF SPECIFICATIONS
Subject Grade Grading period School Year
DOMAINS Total Number
Time
Spent/
Of Test items
Topic
Frequenc Remembering Understanding Applying Analyzing Evaluating Creating Actual Adjusted
y
1. 3 3.33 3
2. 4 4.44 4
3. 1 1.11 1+1
4. 6 6.66 7
5. 8 8.88 9
6. 5 5.55 6
7. 8 8.88 9
8. 2 2.22 2
9. 4 4.44 4
10. 4 4.44 4
45 49.95 49
TOTAL
50
How to Construct Table of Specification
6. Scatter the items per topic per
domain
• Determine the number of items per
complexity / level of cognitive domain.
In this case, we have already a
pre-computed value of 30-20-20-
30 (10-10-10)
TABLE OF SPECIFICATIONS
Subject Grade Grading period School Year
DOMAINS Total Number
Time
Spent/
Of Test items
Topic
Frequenc Remembering Understanding Applying Analyzing Evaluating Creating Actual Adjusted
y
1. 3 3.33 3
2. 4 4.44 4
3. 1 1.11 1+1
4. 6 6.66 7
5. 8 8.88 9
6. 5 5.55 6
7. 8 8.88 9
8. 2 2.22 2
9. 4 4.44 4
10. 4 4.44 4
45 15 10 10 5 5 5 49.95 49
TOTAL
30% 20% 20% 30% ( Higher-order Thinking) 50
How to Construct Table of Specification
7. On the basis of your experience / analysis start allocating the
items with respect to the total number of items per domain and the
total number of items per topic beginning with the higher-order
thinking domains down to remembering. It is suggested that the
order of complexity from creating to remembering is not altered.
• Review the topics, reflect on previous experiences, and imagine
the teaching learning processes (TLP) that can go with the
topics. You may use teaching guides and other similar materials.
• Be mindful of the total points per topic.
TABLE OF SPECIFICATIONS
Subject Grade Grading period School Year
DOMAINS Total Number
Time
Spent/
Of Test items
Topic
Frequenc Remembering Understanding Applying Analyzing Evaluating Creating Actual Adjusted
y
1. 3 3.33 3
2. 4 4.44 4
3. 1 1.11 1+1
4. 6 1 6.66 7
5. 8 2 8.88 9
6. 5 5.55 6
7. 8 2 8.88 9
8. 2 2.22 2
9. 4 4.44 4
10. 4 4.44 4
45 15 10 10 5 5 5 49.95 49
TOTAL
50
Examples of Student Activities and Verbs for Revised
Bloom’s Cognitive Levels (Jacobs & Chase, 1992:19)
Example matrix :
The Knowledge
Remember Understand Apply Analyze Evaluate Create
Dimension
Facts list para-phrase classify outline rank categorize
Concepts recall explains show contrast criticize modify
Processes outline estimate produce diagram defend design
Procedures reproduce give an example relate identify critique plan
Principles state converts solve different-iates conclude revise
Meta-cognitive proper use interpret discover infer predict actualize
Possible reasons for faulty test
questions:
Questions are copied verbatim from the
book or other resources.
Not consulting the course outline.
Much consideration is given to reduce
printing cost.
No TOS or TOS was made after making
the test.
Factors to consider in preparing test
questions (Oriondo & Antonio, 1984)
Purpose of the test
Time available to prepare, administer
and score the test.
Number of students to be tested.
Skill of the teacher in writing the test.
Facilities available in reproducing the
test.
“To be able to prepare a
GOOD TEST , one has to have
a mastery of the subject
matter, knowledge of the
pupils t o be tested, skill in
verbal expression and the
use of the different t e s t
f o r ma t ”
Evaluating Educational Outcomes
(Oriondo & Antonio,1984)
What are the major categories and
format of traditional test?
Selected Response
Constructed
response-Test
Short answer Multiple
test choice
Essay test TF/Alternative
response
Problem - Organized by CCIT and COED in cooperation with HRMD and MIS
Matching
solving test Type
Well constructed test?
enable teachers
motivate students to assess the
and reinforce students mastery
learning of course
objectives
provide feedback on
teaching, often
showing what was or
was not
communicated clearly.
Well constructed tests motivate students and reinforce
learning. Well constructed tests enable teachers to assess the
Organized by CCIT and COED in cooperation with HRMD and MIS
students mastery of course objectives. Tests also provide
feedback on teaching, often showing what was or was not
communicated clearly
Guidelines for Writing Multiple Choice Items
A multiple choice item (MC) is characterized by the
following components:
Stem - the initial part of the item in which
the task is stated.
Options - the set of response choices
presented under the stem.
Key - correct response option
Distractors - incorrect response options
62
Guidelines for Writing Multiple Choice Items
The stem may be a direct question or an incomplete
statement with the options that complete the
statement.
Note: The direct question is generally easier to
develop and to understand.
63
Guidelines for Writing Multiple Choice
Items
From TIMSS 2003
1. The stem has enough information to make the
task clear and unambiguous to students.
First Draft: Solve the equation 25 –X = 19.
Revision: What number should go in the
blank to make the number
sentence true?
25 – ____ = 19
64
Guidelines for Writing Multiple Choice
Items
From TIMSS 2003
2. Do not include extraneous information in stem.
This may confuse students.
First Draft:
Mang Gorio has 180 eggs that he has collected on
his farm. He wants to take them to the market 3 km
away. Before he takes them he must put them in
cartons. Each carton holds 12 eggs. How many
cartons does Mang Gorio need?
65
Guidelines for Writing Multiple Choice
Items
From TIMSS 2003
Revision:
Eggs are packed 12 to a carton. How many
cartons are needed to pack 180 eggs?
A. 13 C. 15
B. 14 D. 18
66
Guidelines for Writing Multiple Choice
Items
From TIMSS 2003
3. Use a direct question rather than a directive in the
stem.
First Draft: Find the area of a rectangle with
sides 2 cm and 6 cm.
Revision: What is the area of a rectangle
with sides 2 cm and 6 cm?
67
Guidelines for Writing Multiple Choice
Items
From TIMSS 2003
4. Include “of the following” in the stem if there is no
universally agreed upon answer to the question.
First Draft: Which is the best conductor of
electricity?
Revision: Which of the following is the best
conductor of electricity?
A. air C. rubber
B. copper D. water
68
Guidelines for Writing Multiple Choice
Items
From TIMSS 2003
5. Make sure there is only one correct or best
answer.
First draft: Which animal is hatched from
eggs?
A. spider C. rabbit
B. snake D. carabao
Revision: Which animal is hatched from eggs?
A. goat C. rabbit
B. snake D. carabao 69
6. Do not provide hint to answers from the options. An
essay or constructed response item may be more
appropriate than a multiple choice.
7. Avoid using trick distractors.
8. Observe the rules of grammar and syntax.
9. Make sure all options are parallel in length, level of
complexity and grammatical structure
Guidelines for Writing Multiple Choice
Items
10. Arrange the options in logical order.
11. Reduce the reading burden in the options by
moving the word/s to the stem.
12. Avoid reference to “you” or “your”.
13. Avoid using “none of these” and “all of these” as
response options.
71
Guidelines for Writing Multiple Choice
Items
14. Avoid the use of specific determiners that qualify the
response options providing clues to the correct options:
“never” and “always” tend to be incorrect
“some”, “sometimes”, and “may” tend to appear in
correct options.
15. Make sure that the stem or options to one question do
not answer another question, or rule out distractors in
another question.
72
TRUE-FALSE ITEMS
True-false items requires
students to identify statements
which are correct or incorrect.
Only two responses are
possible in this item format.
Guidelines for Writing
True-False Items
1. Each statement should
include only one idea.
⚫The idea should be stated in the main point of the item rather than on some
trivial detail.
FIRST DRAFT: The true-false item as seen by Newton takes little time to prepare.
REVISION: The true-false item takes little time to prepare.
2. Each statement should be short and
simple.
FIRST DRAFT: True-false items provide for
adequate sampling of objectives and can be
scored rapidly.
REVISION: True-false items provide for
adequate sampling of objectives.
True-false items can be scored
rapidly.
3. Qualifiers such as “few”, “many”,
“seldom”, “always”, “never”, “small”,
“large”, and so on should be avoided.
They make the statements vague and
indefinite.
FIRST DRAFT: True-false items are
seldom prone to guessing.
REVISION: True-false items are prone
to guessing.
4. Negative statements should be used
sparingly.
5. Double negatives should be avoided.
6. Statements of opinions or facts
should be attributed to some important
person or organization.
7. The number of true and false
statements should be equal whenever
possible.
Matching Items
It is a selection-type of item
consisting of stimuli (or stems)
called premises, and a series of
options called responses.
Guidelines for Writing
Matching Items
1. Include only materials that belong to
the same category.
2. Keep premises short and place the
responses on the right side.
3. Use more responses than premises
and allow the responses to be used
more than once.
4. Place the matching items on one
page.
Guidelines for Writing Short answer/fill in
the blank items.
1.State the item clearly and
precisely so that only one
correct answer is acceptable.
2.Begin with a question and shift
to an incomplete statement
later to achieve preciseness
and conciseness.
3. Leave the blank at the end
of the statement.
4. Focus on one important
idea instead of trivial detail
and leave only one blank.
5. Avoid giving clues to the
correct answer.
Guidelines for Writing
Essay Items
1. State questions that require clear, specific,
and narrow task or topic to be performed.
2. Give enough time limit for answering each
essay question.
3. Require students to answer all questions
4. Make it clear to students if spelling,
punctuation, content, clarity, and style are
to be considered in scoring the essay
questions.
5. Grade each essay question by
the point method, using well-defined
criteria. (rubric)
6. Evaluate all of the students’
responses to one question before
going to the next question.
7. Evaluate answers to essay
questions without identifying
the students.
8. If possible, two or more
correctors must be employed
to ensure reiable results.
Writing Completion Items
• Completion items require the
students to associate an
incomplete statement with a
word or phrase recalled from
memory
Guidelines In Completion Items
• As a general rule, it is best to use ONE blank in a
completion item.
• The blank should be placed NEAR or at the END of
the sentence
• Give clear instructions indicating whether synonyms
will be correct and whether spelling will be a factor in
scoring
• Give clear instructions indicating whether synonyms
will be correct and whether spelling will be a factor in
scoring
• Avoid using direct statements from the textbook with
a word or two missing
• All blanks for all items should be of equal length and
long enough to accommodate the longest response.
Writing Arrangement Items
• It is used for testing knowledge
of sequence and order
Guidelines In Arrangement Items
• Items to be arranged should belong to one
category only.
• Provide instructions on the rationale for
arrangement or sequencing.
• Specify the response code students have to
use in arranging the items.
• Provide sufficient space for the writing of
answer.
Writing Completion-Drawing Items
• It is one wherein an incomplete
drawing is presented which the
student has top complete.
Guidelines In Completion-Drawing Items
• Provide instruction on how the drawing
will be completed.
• Present the drawing to be completed.
Writing Correction Items
• It is similar to the completion
item, except that some words or
phrase have to be changed to
make the sentence correct.
Guidelines In Correction Items
• Underline or italicize the word or phrase to
be corrected in a sentence.
• Specify in the instruction where students
will write their correction of the underlined
or italicized word or phrase.
• Write items that measure higher levels of
cognitive behavior.
Sample
Directions: Change the underlined word or phrase to make
each of the following statements correct. Write your answer
on the space before each number.
1. Inflation caused by increase demand is known
as oil-push.
2. Inflation is the phenomenon of falling prices.
3. Expenditure on non-food items increases with
increased income according to Keynes.
4. The additional cost for producing an
additional unit of a product is average cost.
Writing Identification Items
• It is one wherein an unknown
specimen is to be identified by
name or other criterion
Guidelines In Identification Items
• The direction of the test should indicate
clearly what has to be identified.
• Sufficient space has to be provided for the
answer to each item.
• The question should not be copied verbatim
from the book.
Sample
Directions: Following are phrase definitions of term, opposite
each number, write the term defined.
1. Weight divided by volume
2. Degree of hotness or coldness of a body
3. Changing speed of a moving body
4. Ratio of resistance to effort
Writing Enumeration Items
• An enumeration item is one wherein the
student has to list down parts or
elements/components of a given concept or
topic.
Guidelines In Enumeration Items
• Exact numbers of expected answers have to be
specified.
• Spaces for the writing of answers have to be
provided and should be of the same length
Writing Identification Items
• It consists of a pair of words, which
are related to each other. This type
of item is often used in measuring the
student’s skill in sensing association
between paired words or concepts
Guidelines In Analogy Items
• The pattern of relationship in the first pair
of words must be the same pattern in the
second pair.
• Options must be related to the correct
answer.
• The principle of parallelism has to be
observed in writing options.
• More than three options have to be included
in each analogy item top lessen guessing.
• All items must be grammatically consistent.
Sample
Sampaguita : Philippines – Rose of Sharon : Korea
Bonifacio : Philippines, ; USA
a. Jefferson
b. Lincoln
c. Madison
d. Washington
Writing Interpretive Items
• Writing Interpretative Items
Interpretative test item is often used
in testing higher cognitive behavior.
Guidelines In Interpretive Items
• The interpretative exercise must be related
to the instruction provided the students.
• The material to be presented to the
students should be new to them but similar
to what was presented during instruction.
• Written passages should be as brief as
possible.
• The students have to interpret, apply,
analyze and comprehend in order to answer a
given question in the exercise
Writing Short Explanation Items
• This type of item is similar to an essay test
but requires a short response, usually a
sequence or two. This type of question is a
good practice for the students in expressing
themselves concisely
Guidelines In Short Explanation Items
• Specify in the instruction of the test, the
number of sentences that the students can use
in answering the question.
• Make the question brief and to the point for the
students not to be confused.
When to Use Essay or Objective Tests
Essay tests are appropriate when:
the group to be tested is small and the test is not to be reused.
you wish to encourage and reward the development of student skill in
writing.
you are more interested in exploring the student’s attitudes than in
measuring his/her achievement.
Objective tests are appropriate when:
the group to be tested is large and the test may be reused.
highly reliable scores must be obtained as efficiently as possible.
impartiality of evaluation, fairness, and freedom from possible test
scoring influences are essential
Either essay or objective tests can be
used to:
measure almost any important
educational achievement a written test
can measure.
test understanding and ability to apply
principles.
test ability to think critically.
test ability to solve problems.
Matching Learning Objectives with Test
Items
Instructions: Below are four test item categories labeled A, B, C, and D. Following
these test item categories are sample learning objectives. On the line to the left of
each learning objective, place the letter of the most appropriate test item category.
A = Objective Test Item (multiple choice, true-false, matching)
B = Performance Test Item
C = Essay Test Item (extended response) D = Essay Test Item (short
answer)
1. Name the parts of the human skeleton
2. Appraise a composition on the basis of its organization
3. Demonstrate safe laboratory skills
4. Cite four examples of satire that Twain uses in
Huckleberry Finn
5. Design a logo for a web page
6. Describe the impact of a bull market
7. Diagnose a physical ailment
8. List important mental attributes necessary for an athlete
9. Categorize great American fiction writers
10. Analyze the major causes of learning disabilities
Unless assessment
improves
the teaching-learning
process,
it serves no purpose at all.
POINTS TO PONDER…
A good lesson makes a good question
A good question makes a good content
A good content makes a good test
A good test makes a good grade
A good grade makes a good student
A good student makes a good COMMUNITY
Jesus Ochave Ph.D.
VP Research Planning & Development
Philippine Normal University
What is test reliability?
Reliability-consistency of the
under 3 conditions:
when tested on the same person
When retested on the same measure
Similarity of responses across items
Organized by CCIT and COED in cooperation with HRMD and MIS
that measure the same characteristics
What are the factors
affecting reliability?
Factors affecting reliability
The number of test items
Individual differences
Organized by CCIT and COED in cooperation with HRMD and MIS
External environment
What are the different
ways to establish test
reliability?
Determining the reliability of test
Variable you are measuring
Types of test
Organized by CCIT and COED in cooperation with HRMD and MIS
Number of versions of test
Methods in testing Reliability
Test-retest
Parallel forms
Split-half
Test of internal consistency
Inter-rater reliability
Organized by CCIT and COED in cooperation with HRMD and MIS
Test-retest
How is the
What statistics is reliability done?
used?
Applicable for
Correlate the test test that
score from 1st to measure stable
2nd administration variables
Pearson
aptitude
product
correlation
Organized by CCIT and COED in cooperation with HRMD and MIS
Psychomotor
behavior
TEST RETEST
Test-retest reliability refers to the extent to
which a test or measure
administered at one time is correlated with the
same test or measure
administered to the same people at another
time.
If the correlation between separate administrations of
the test is high (e.g. 0.7 or higher as in this Cronbach's
alpha-internal consistency-table), then it has good
test–retest reliability.
CONDITIONS
the same experimental tools
the same observer
the same measuring instrument, used under
the same conditions
the same location
Repetition over a short period of time.
same objectives
DISADVANTAGES
It takes a long time for results to be obtained.
If the duration is to brief then participants may
recall information from the first test which could
bias the results.
If the duration is too long it is feasible that the
participants could have changed in some
important way which could also bias the results .
Split half
How is the
What statistics is reliability done?
used?
Administer the
Correlates 2 sets of test to a group of
scores using Person r examinees.
After the correlation use Items needs to be split
another formula called into halves using the
Spearman Brown Coefficient odd-even technique
Applicable
Organized by CCIT and when
COED in cooperation with HRMD and MIS
the test has large
numbers
SPLIT HALF METHOD
A test for a single knowledge area is split into
two parts and then both parts given to one
group of students at the same time.
SPLIT HALF METHOD
Split-half testing is a measure
of internal consistency. How well the
test components contribute to the
construct that’s being measured. It is
most commonly used for multiple
choice tests you can theoretically use
it for any type of test—even tests with
essay questions.
How to Split it half?
first half and second half
odd and even numbers.
If the two halves of the test
provide similar results this would
suggest that the test has internal
reliability.
STEPS
Administer the test to a large group
students (ideally, over about 30).
Randomly divide the test questions into
two parts. For example, separate even
questions from odd questions.
Score each half of the test for each
student.
Find the correlation coefficient for the two
halves.
Parallel Forms
(also called equivalent forms reliability)
uses one set of questions divided into
two equivalent sets (“forms”), where
both sets contain questions that
measure the same construct,
knowledge or skill. The two sets of
questions are given to the same
sample of people within a short period
of time and an estimate of reliability is
calculated from the two sets.
Organized by CCIT and COED in cooperation with HRMD and MIS
Parallel Forms
Step 1: Give test A to a group of 50 students on a
Monday.
Step 2: Give test B to the same group of students that
Friday.
Step 3: Correlate the scores from test A and test B.
In order to call the forms “parallel”, the observed
score must have the same mean and variances. If
the tests are merely different versions (without the
“sameness” of observed scores), they are
called alternate forms
Organized by CCIT and COED in cooperation with HRMD and MIS
Parallel Forms
How is the
What statistics is reliability done?
used?
Applicable if
Correlate the test there are two
results from the 1st versions of test
form score the to
2nd form
Entrance
exam
Pearson r Organized by CCIT and COED in cooperation with HRMD and MIS
Licensure
exam
Parallel Forms
Advantages:
Parallel forms reliability can avoid some
problems inherent with test-resting.
Disadvantages:
You have to create a large number of
questions that measure the same
construct.
Proving that the two test versions are
equivalent (parallel) can be a challenge.
Organized by CCIT and COED in cooperation with HRMD and MIS
Internal consistency
assesses the correlation
between multiple items in a
test that are intended to
measure the same construct.
You can calculate internal
consistency without repeating
the test or involving other
researchers, so it's a good way
of assessing reliability when
you only have one data set.
Organized by CCIT and COED in cooperation with HRMD and MIS
Test Internal
Consistency
How is the
What statistics is reliability done?
used?
The procedure involves
A statistical determining if the scores
analysis called for each items are
Cronbach’s alpha consistency answered
or the Kuder by the examinees
Richard
Likert scale
Organized by CCIT and COED in cooperation with HRMD and MIS
Inter-Rater Reliability
refers to statistical measurements that
determine how similar the data collected by
different raters are. A rater is someone who is
scoring or measuring a performance,
behavior, or skill in a human or animal.
Examples of raters would be a job
interviewer, a psychologist measuring how
many times a subject scratches their head in
an experiment, and a scientist observing how
many times an ape picks up a toy.
Organized by CCIT and COED in cooperation with HRMD and MIS
Inter rater reliability
How is the
What statistics is reliability done?
used?
The procedure is used
A statistical to determine the
analysis called consistency of multiple
Kendal ‘s Tau raters when using rating
Coefficient scales and rubrics to
judge performance
Multiple raters
Organized by CCIT and COED in cooperation with HRMD and MIS
Phases of preparing a test
▪ Try-out phase
▪ Item analysis phase
▪ Item revision phase
Item Analysis
▪ There are two important characteristics of an
item that will be of interest of the teacher:
🢭 Item Difficulty
🢭 Discrimination Index
▪ Item Difficulty or the difficulty of an item is
defined as the number of students who are able
to answer the item correctly divided by the total
number of students. Thus:
Item difficulty = number of students with the correct answer
Total number of students
The item difficulty is usually expressed in percentage.
Example:
What is the item difficulty index of an item if
25 students are unable to answer it correctly
while 75 answered it correctly?
Here the total number of students is 100, hence,
the item difficulty index is 75/100 or 75%.
One problem with this type of difficulty
index is that it may not actually indicate
that the item is difficult or easy. A student
who does not know the subject matter will
naturally be unable to answer the item
correctly even if the question is easy. How
do we decide on the basis of this index
whether the item is too difficult or too
easy?
Range of difficulty Interpretation Action
index
0 – 0.25 Difficult Revise or discard
0.26 – 0.75 Right difficulty retain
0.76 - above Easy Revise or discard
▪ Difficult items tend to discriminate between
those who know and those who does not know
the answer.
▪ Easy items cannot discriminate between those
two groups of students.
▪ We are therefore interested in deriving a
measure that will tell us whether an item can
discriminate between these two groups of
students. Such a measure is called an index
of discrimination.
An easy way to derive such a measure is to
measure how difficult an item is with
respect to those in the upper 27% of the
class and how difficult it is with respect to
those in the lower 27% of the class. If the
upper 27% of the class found the item easy
yet the lower 27% found it difficult, then
the item can discriminate properly
between these two groups. Thus:
Index of discrimination = DU – DL
Example: Obtain the index of discrimination of
an item if the upper 27% of the class had a
difficulty index of 0.60 (i.e. 60% of the upper
27% got the correct answer) while the lower 25%
of the class had a difficulty index of 0.20.
DU = 0.60 while DL = 0.20, thus index of
discrimination = .60 - .20 = .40.
▪ Theoretically, the index of discrimination can
range from -1.0 (when DU =0 and DL = 1) to 1.0
(when DU = 1 and DL = 0)
▪ When the index of discrimination is equal to -1,
then this means that all of the lower 27% of the
students got the correct answer while all of the
upper 27% got the wrong answer. In a sense,
such an index discriminates correctly between
the two groups but the item itself is highly
questionable.
▪ On the other hand, if the index
discrimination is 1.0, then this means that
all of the lower 27% failed to get the correct
answer while all of the upper 27% got the
correct answer. This is a perfectly
discriminating item and is the ideal item
that should be included in the test.
▪ As in the case of index difficulty, we
have the following rule of thumb:
Index Range Interpretation Action
-1.0 to -.50 Can discriminate Discarded
but the item is
questionable
-.55 to .45 Non-discriminating Revised
.46 to 1.0 Discriminating item Include
Example: Consider a multiple item choice type
of test with the ff. data were obtained:
Item Options
A B* C D
1
0 40 20 20 Total
0 15 5 0 Upper 27%
0 5 10 5 Lower 27%
The correct response is B. Let us compute the difficulty index and index of
discrimination.
The correct response is B. Let us compute the
difficulty index and index of discrimination:
Difficulty index = no. of students getting the correct answer
Total
= 40
100
= 40%, within of a “good item”
The discrimination index can be similarly be
computed:
DU = no. of students in the upper 27% with correct response
No. of students in the upper 27%
=15/20 = .75 or 75%
DL= no. of students in lower 75% with correct
response no. of students in the lower
25%
= 5/20 = .25 or 25%
Discrimination index = DU – DL
= .75 - .25
= .50 or 50%
Thus, the item also has a “good discriminating power”.
It is also instructive to note that the distracter A
is not an effective distracter since this was never
selected by the students. Distracter C and D
appear to have a good appeal as distracters.
Index of Discrimination – is the difference
between the proportion of the upper group who
got an item right and the proportion of the lower
group who got the item right.
More Sophisticated
Discrimination Index
▪ Item Discrimination refers to the ability of an
item to differentiate among students on the
basis of how well they know the material
being tested.
▪ A good item is one that has good
discriminating abilityand has a sufficient
level of difficulty (not too difficult nor too
easy).
The Item-Analysis Procedure for Norm
provides the following information:
1. The difficulty of an item
2. The discriminating power of an item
3. The effectiveness of each alternative
Benefits derived from Item Analysis
1. It provides useful information for class
discussion of the test.
2. It provides data which helps students improve
their learning.
3. It provides insights and skills that lead to the
preparation of better tests in the future.
Index of
Difficulty
Index of Item Discriminating Power
The discriminating power of an item is reported as
a decimal fraction; maximum discriminating power
is indicated by an index of 1.00.
Maximum discrimination is usually found at the
50 per cent level of difficulty.
0.00 – 0.20 = very difficult
0.21 – 0.80 = moderately difficult
0.81 – 1.00 = very easy
Validation
▪ After performing the item analysis and
revising the items which need revision, the
next step is to validate the instrument.
▪ The purpose of validation is to determine the
characteristics of the whole test itself,
namely, the validity and reliability of the test.
▪ Validation is the process of collecting and
analysing evidence to support the
meaningfulness and usefulness of the test.
Validity
▪ is the extent to which measures what it
purports to measure or referring to the
appropriateness, correctness, meaningfulness,
and usefulness of the specific decisions a
teacher makes based on the test results.
There are three main types of
evidences that may be
collected:
1. Content-related evidence of validity
2. Criterion-related evidence of validity
3. Construct-related evidence of validity
Content-related evidence of
validity
▪ refers to the content and format of the
instrument.
🢭 How appropriate is the content?
🢭 How comprehensive?
🢭 Does it logically get at the intended variable?
🢭 How adequately does the sample of items or
questions represent the content to be assessed?
Criterion-related evidence of
validity
▪ refers to the relationship between scores
obtained using the instrument and scores
obtained using one or more other test
(often called criterion).
🢭 How strong is this relationship?
🢭 How well do such scores estimate present or
predict future performance of a certain
type?
Construct-related evidence of
validity
▪ refers to the nature of the psychological
construct or characteristic being measured by
the test.
🢭 How well does a measure of the construct
explain differences in the behaviour of the
individuals or their performance on a certain
task?
Usual procedure for determining
content validity
▪ Teacher write out objectives based on TOS
▪ Gives the objectives and TOS to 2 experts
along with a description of the test takers.
▪ The experts look at the objectives, read over
the items in the test and place a check mark
in front of each question or item that they
feel does NOT measure one or more
objectives.
Usual procedure for determining
content validity
▪ They also place a check mark in front of each
objective NOT assessed by any item in the
test.
▪ The teacher then rewrites any item so
checked and resubmits to experts
and/or writes new items to cover those
objectives not heretofore covered by the
existing test.
Usual procedure for determining
content validity
▪ This continues until the experts approve all
items and also when the experts agree that
all of the objectives are sufficiently covered
by the test.
Obtaining Evidence for criterion-
related Validity
▪ The teacher usually compare scores on the
test in question with the scores on some
other independent criterion test which
presumably has already high validity
(concurrent validity).
▪ Another type of validity is called the
predictive validity wherein the test scores in
the instrument is correlated with scores on
later performance of the feelings.
Gronlunds Expectancy Table
Grade Point Average
Test Score Very Good Good Needs
Improvement
High 20 10 5
Average 10 25 5
Low 1 10 14
▪ The expectancy table shows that there were
20 students getting high test scores and
subsequently rated excellent in terms of their
final grades;
▪ And finally 14 students obtained low test
scores and were later graded as needing
improvement.
▪ The evidence for this particular test tends to
indicate that students getting high score on it
would be graded excellent; average scores
on it would be rated good later; and students
getting low scores on the test would be
graded needing improvement later.
• After performing the item analysis and
Validation
revising the items which need revision,
the next step is to validate the
instrument.
• The purpose of validation is to
determine the characteristics of the
whole test itself, namely, the validity
and reliability of the test.
• the process of collecting and analyzing
evidence to support the
meaningfulness and usefulness of the
test.
171
Organized by CCIT and COED in cooperation with HRMD and MIS
• is the extent to which
measures what it purports
Validity
to measure or referring to
the appropriateness,
correctness,
meaningfulness, and
usefulness of the specific
decisions a teacher makes
based on the test results.
172
Organized by CCIT and COED in cooperation with HRMD and MIS
Types of
Validity
Content Face Predictive Construct Concurrent Convergent Divergent
173
Organized by CCIT and COED in cooperation with HRMD and MIS
When the The items are
Procedure
Content Validity
compared with the
items objectives of the
represents the program. The items
domain being need to measure
directly the
measured objectives(for
achievement) or
definition( for the
scales). A review
conducts the 174
Organized by CCIT and COED in cooperation with HRMD and MIS
checking
When the test The items and
Procedure
Face Validity
is presented layout are
reviewed and
well, free of tried out on a
errors, and small group of
administered respondents.A
well manual for the
administration can
be made as a
guide for the test 175
Organized by CCIT and COED in cooperation with HRMD and MIS
administration.
A measure The correlation
Procedure
Predictive Validity
should predict a coefficient is
future criterion. obtained
Example is an
where the x-
entrance exam
predicting the variable is used
grades of the as the
students after predictor and
the first semester the y- variable 176
as the criterion
Organized by CCIT and COED in cooperation with HRMD and MIS
The Pearson r
Procedure
The components
Construct Validity
or factors of the can be used to
test should correlated the
contain items that items for each
are strongly factor. However,
correlated the is a
technique called
factor analysis
to determine 177
Organized by CCIT and COED in cooperation with HRMD and MIS
which items are
highly correlated
Correlation is
Procedure
When the
Convergent Validity
components or done for the
factors of the test factors of the
are hypothesized test
to have a positive
correlation
178
Organized by CCIT and COED in cooperation with HRMD and MIS
When the Correlation is
Procedure
Divergent Validity
components or done for the
factors of the test factors of the
are hypothesized to
test
have a negative
correlation. An
example to correlate
are the scores in the
test on intrinsic and
extrinsic motivation 179
Organized by CCIT and COED in cooperation with HRMD and MIS
When the Correlation is
Procedure
Divergent Validity
components or done for the
factors of the test factors of the
are hypothesized to
test
have a negative
correlation. An
example to correlate
are the scores in the
test on intrinsic and
extrinsic motivation 180
Organized by CCIT and COED in cooperation with HRMD and MIS