Table of specification
What is a Table of Specifications?
A TOS, sometimes called a test blueprint, is a table that helps teachers align objectives, instruction, and assessment
(e.g., Notar, Zuelke, Wilson, & Yunker, 2004).
This strategy can be used for a variety of assessment methods but is most commonly associated with constructing
traditional summative tests.
What is a Table of Specifications?
(TOS) A two way chart that relates the learning outcomes to the course content
It enables the teacher to prepare a test containing a representative sample of student knowledge in each of the areas
tested
Step 1- Determine the coverage of your exam
The first rule in making exams.
In making a document called table of specification make sure that the coverage of your exam is something that you
have satisfactorily taught in class. Select the topics that you wish to test in the exam. It is possible that you will not be
able to cover all these topics as it might create a test that is too long and will not be realistic for your students in the
given time. So select only the most important topics.
Step 2- Determine your testing objectives for each topic area
In this step, you will need to be familiar with bloom’s taxonomy of thinking skills. Bloom has identified the hierarchy of
learning objectives, from the lower thinking skills of remembering and understanding to the higher thinking skills of
evaluating and creating.
Bloom’s Taxonomy has six categories: (starting from lower level to highest) - (1) Remembering, (2) Understanding, (3)
Applying, (4) Analyzing, (5) Evaluating and (6) Creating
Step 2- Determine your testing objectives for each topic area
Bloom’s Taxonomy has six categories: (starting from lower level to highest) - (1) Remembering, (2) Understanding, (3)
Applying, (4) Analyzing, (5) Evaluating and (6) Creating
So for each content area that you wish to test, you will have to determine how you will test each area. Will you test
simply their recall of knowledge? Or will you be testing their comprehension of the matter? Or perhaps you will be
challenging them to analyze and compare and contrast something. Again, this would depend on your instructional
objectives in the classroom. Did you teach them lower thinking skills or did you challenge them by making them think
critically?
Step 3- Determine the duration for each content area
The next step in making the table of specifications is to write down how long you spent teaching a particular topic. This
is important because it will determine how many points you should devote for each topic.
Logically, the longer time you spent on teaching a material, then the more questions should be devoted for that area.
Topic: Experiments, Outcomes, Sample space, and Events
Competency: Describes an experiment, outcome, sample space and events.
Example :
Total teaching hours for the third quarter = 25 hours
Teaching hours for the topic = 3 hours
3
Percentage =.12× 100=12 %
25
Topic: Experiments, Outcomes, Sample space, and Events
Describes an experiment, outcome, sample space and events.
3
Percentage =.12× 100=12 %
25
Number of items.
Total number of items = 50 items
Number of items
0.12 ×50=6 6 items
Step 4- Determine the Test Types for each objective
Now that you have created your table of specifications for your test by aligning your objectives to bloom’s taxonomy, it’s
time to determine the test types that will accomplish your testing objectives. For example, remembering questions can
be accomplished easily through multiple choice questions or matching type exams.
Step 5- Polish your terms of specification
After your initial draft of the table of specifications, it’s time to polish it. Make sure that you have covered in your terms
of specification the important topics that you wish to test. The number of items for your test should be sufficient for the
time allotted for the test. You should seek your academic coordinator and have them comment on your table of
specification. They will be able to give good feedback on how you can improve or modify it.
Item analysis and validation
The teacher normally prepares a draft of the test. Such a draft is subjected to item analysis and validation in order to ensure
that the final version of the test would be useful and functional.
First, the teacher tries out the draft test to a group of students of similar characteristics as the intended test takers (try-out
phase).
From the try-out group, each item will be analyzed in terms of its ability to discriminate between those who know and those
who do not know and also its level of difficulty (item analysis phase).
The item analysis will provide information that will allow the teacher to decide whether to revise or replace an item (item
revision phase).
Then, finally, the final draft of the test is subjected to validation if the intent is to make use of the test as a standard test for
the particular unit or grading period.
TWO IMPORTANT CHARACTERISTICS OF AN ITEM
(a) item difficulty, and
(b) discrimination index.
What is the Discrimination index?
The discrimination index is a basic measure of the validity of an item. It is a measure of an item's ability to
discriminate between those who scored high on the total test and those who scored low.
Though there are several steps in its calculation, once computed, this index can be interpreted as an indication of
the extent to which overall knowledge of the content area or mastery of the skills is related to the response on an
item.
Perhaps the most crucial validity standard for a test item is that whether a student got an item correct or not is due
to their level of knowledge or ability and not due to something else such as chance or test bias.
An easy way to derive such a measure is to measure how difficult an item is with respect to those in the upper 25%
of the class and how difficult it is with respect to those in the lower 25% of the class. If the upper 25% of the class
found the item easy yet the lower 25% of the class found it difficult, then the item can discriminate properly
between these two groups.
Example: Obtain the index of discrimination of an item if the upper 25% of the class had a difficulty index of 0.60
(i.e. 60% of the upper 25% got the correct answer) while the lower 25% of the class had a difficulty index of 0.20.
DU = 0.60 while DL = 0.20,
thus index of discrimination = .60 -.20 = .40.
Difficulty index
The correct response is B. Let us compute the difficulty index and index of discrimination
No. of students getting correct response
Difficulty Index=
total
40
Difficulty Index=
80 ¿
¿
Difficulty Index=50 %
Difficulty Index=50 %
Right difficulty
Retain
The discrimination index can similarly be computed:
Discrimination Index=DU −DL
No. of students ∈theupper 25 % with correct reponse
DU =
No . of students ∈theupper 25 %
15
DU = =0.75∨75 %
20
The discrimination index can similarly be computed:
No. of students ∈theupper 25 % with correct reponse
DU =
No . of students ∈theupper 25 %
15
DU = =0.75∨75 %
20
No . of students ∈thelower 75 % withcorrect response
DL=
No . of students ∈the lower 25 %
5
DL= =0.25∨25 %
20
Discrimination Index=DU −DL=0.75−0.25=0.50∨50 %
Thus, the item has “good discriminating power”.
Discrimination Index=DU −DL=0.75−0.25=0.50∨50 %
Thus, the item has “good discriminating power”.
The item-analysis procedure for norm-provides the following information
The difficulty of the item
The discriminating power of the item
The effectiveness of each alternative
Benefits derived from item analysis
It provides useful information for class discussion of test
It provides data which helps improve their leaning
It provides insights and skills that lead to preparation of better tests in the future
Index of Difficulty
Ru+ Rl
P= ×100
T
It is also instructive to note that the distracter A is not an effective distracter since this was never selected by the
students. Distracters C and D appear to have a good appeal as distracters.
Where
Ru – the number in the upper group who answered the items correctly.
Rl - the number in the lower group who answered the items correctly.
T – The total number who tried the item
The discriminating power of an item is reported as decimal fraction; maximum discriminating power is indicated by
an index of 1.00.
Maximum discrimination is usually found at the 50 percent level of difficulty
0.00 – 0.20 Very difficult
0.21- 0.80 moderately difficult
0.81-1 Very easy
validation
Validity is the extent to which a test measures what it purports to measure or as referring to the appropriateness,
correctness, meaningfulness and usefulness of the specific decisions a teacher makes based on the test results.
These two definitions of validity differ in the sense that the first definition refers to the test itself while the second
refers to the decisions made by the teacher based on the test. A test is valid when it is aligned to the learning
outcome.
A teacher who conducts test validation might want to gather different kinds of evidence. There are essentially three
main types of evidence that may be collected: content-related evidence of validity, criterion-related evidence of
validity and construct-related evidence of validity.
Content-related evidence of validity
Content-related evidence of validity refers to the content and format of the instrument.
How appropriate is the content?
How comprehensive?
Does it logically get at the intended variable?
How adequately does the sample of items or questions represent the content to be assessed?
Criterion-related evidence of validity
Criterion-related evidence of validity refers to the relationship between scores obtained using the instrument and
scores obtained using one or more other tests (often called criterion).
How strong is this relationship?
How well do such scores estimate present or predict future performance of a certain type?
Construct-related evidence of validity
Construct-related evidence of validity refers to the nature of the psychological construct or characteristic being
measured by the test.
How well does a measure of the construct explain differences in the behavior of the individuals or their
performance on a certain task?
The usual procedure for determining content validity may be described as follows: The teacher writes out the
objectives of the test based on the table of specifications and then gives these together with the test to at least two
(2) experts along with a description of the intended test takers. The experts look at the objectives, read over the
items in the test and place a check mark in front of each question or item that they feel does not measure one or
more objectives.
They also place a check mark in front of each objective not assessed by any item in the test. The teacher then
rewrites any item so checked and resubmits to the experts and/or writes new items to cover those objectives not
heretofore covered by the existing test. This continues until the experts approve of all items and also until the
experts agree that all of the objectives are sufficiently covered by the test.
In order to obtain evidence of criterion-related validity, the teacher usually compares scores on the test in question
with the scores on some other independent criterion test which presumably has already high validity. For example,
if a test is designed to measure mathematics ability of students and it correlates highly with a standardized
mathematics achievement test (external criterion), then we say we have high criterion-related evidence of validity.
In particular, this type of criterion-related validity is called its concurrent validity.
Another type of criterion-related validity is called predictive validity wherein the test scores in the instrument are
correlated with scores on a later performance (criterion measure) of the students. For example, the mathematics
ability test constructed by the teacher may be correlated with their later performance in a Division wide
mathematics achievement test.
Apart from the use of correlation coefficient in measuring criterion-related validity, Gronlund suggested using the
so-called expectancy table. This table is easy to construct and consists of the test (predictor) categories listed on the
left hand side and the criterion categories listed horizontally along the top of the chart. For example, suppose that a
mathematics achievement test is constructed and the scores are categorized as high, average, and low.
The criterion measure used is the final average grades of the students in high school: Very Good, Good, and Needs
Improvement. The two way table lists down the number of students failing each possible pairs of (test, grades as
shown below)
The expectancy table shows that there were 20 students getting high test scores and subsequently rated excellent
in terms of their final grades; 25 students got average scores and subsequently rated good in their finals; and finally,
14 students obtained low test scores and were later graded as needing improvement. The evidence for this
particular test tends to indicate that students getting high scores on it would be graded excellent; average scores on
it would be rated good later; and students getting low scores on the test would be graded as needing improvement
later.
reliability
Reliability refers to the consistency of the scores obtained - how consistent they are for each individual from one
administration of an instrument to another and from one set of items to another. We already gave the formula for
computing the reliability of a test: for internal consistency; for instance, we could use the split-half method or the
Kuder-Richardson formulae (KR-20 or KR-21)
Reliability and validity are related concepts. If an instrument is unreliable, it cannot yet valid outcomes. As reliability
improves, validity may improve (or it may not). However, if an instrument is shown scientifically to be valid then it is
almost certain that it is also reliable
activity
Find the index of difficulty of the following situations:
1. N=60; upper 25%= 2 lower 25%=6
2. N=80; upper 25%= 2 lower 25%=9
3. N=30; upper 25%= 1 lower 25%=6
4. N=50; upper 25%= 3 lower 25%=8
5. N=70; upper 25%= 4 lower 25%=10
Example
#1
N=60; upper 25%= 2 lower 25%=6
Ru+ Rl
Formula; P= ×100
T
Ru = 2 and Rl = 6
Find T
T ¿ N ×Upper 25 % + N ×lower 25 %
T ¿ 60 ×0.25+ 60× 0.25
T ¿ 15+15=30
B. Which of the items in Exercise A are found to be most difficult?