Psychometrics: The Science of
Psychological Assessment
Chaitanya- Psychology Study Center
Objectives
• To create critical understanding of measurement
issues and techniques in psychological inquiry.
• To enable students to develop skills and
competencies in test construction and
standardization.
• To understand the various biases in psychological
testing and assessment.
Unit-1 Perspectives on Psychometrics
• 1.1 Scientific method, realism, truth and psychology.
• 1.2 Scientific measurement in psychometrics and
measurement in the natural sciences.
• 1.3 Measurement models: Classical test theory, Latent
Variable model, Representational measurement
model.
• 1.4 The theory of true scores, the statistical true score,
the platonic true score, Psychological Vs. Physical true
score, the true psychometric: trait or function.
• 1.5 Ethical issues in psychological testing
Unit 2. Process of test construction
• 2.1 Knowledge-based and person-based questionnaire.
– 2.1.1 Objective and open-ended tests
– 2.1.2 Norm-referenced and criterion-referenced testing
– 2.1.3 The correction of guessing in objective knowledge-
based test
• 2.2 Item analysis
– 2.2.1 Classical item analysis statistics for knowledge based
tests
– 2.2.2 Classical item analysis for person-based tests
– 2.2.3 Item analysis in criterion-referenced testing
• 2.3 Item response theory (IRT)
• 2.4 Relation of IRT and Classical test theory
• 2.5 Item Characteristic curve
Unit 3 Standardization of Tests
• 3.1 Reliability: Concept and types of reliability, forms
of error; Spearman-Brown Correction, Cautions in
the use of reliability coefficient
• 3.2 Validity: Concepts and types of validity; Political
validity; Confusion between validation and validity.
• 3.3 Normalization: Algebric normalization, graphical
normalization
• 3.4 Types of norms
• 3.5 The use of Factor Analysis in test construction
Unit 4 Bias in testing and computer
application
• 4.1 Forms of bias
– 4.1.1 Item bias: identifying item bias
– 4.1.2 Differential item functioning, item offensiveness
• 4.2 Intrinsic test bias: Statistical models of intrinsic bias
• 4.3 Extrinsic test bias: Extrinsic test bias and ideology,
legal aspects of extrinsic test bias; guidelines in case of
test bias
• 4.4 Computerization in psychological Testing
• 4.5 Artificial Intelligence and Psychological testing
Definition and characteristics of psychological tests
• Test: a measurement device or technique used
to quantify behavior or aid in the
understanding and prediction of behavior.
A standardized procedure for sampling
behavior and describing it with categories or
scores. (R.J. Gregory, 2004).
• A psychological test is a set of items that are
designed to measure characteristics of human
beings that pertain to behavior.
• A psychological test is essentially an objective and
standardized measure of a sample of behavior.
• Psychological testing refers to all the possible
uses, applications and underline concepts of
psychological and educational tests (Caplan &
Saccuzoo, 2005).
Characteristics of psychological tests
1. Standardization
2. Reliability
3. Validity
4. Norms
5. Objectivity
6. Sample of behavior
1. Standardization:
• Uniformity of procedure in administering and
scoring the test. (different persons/ different
situations*)
• Major part: Formulation of directions
• Test constructor must provide detailed
directions.
2. Reliability:
• Consistency of the scores in the test.
• Internal & temporal consistency.
• The extent to which the test results obtained
are consistent when the test is administered
once or more than once on same sample with
a reasonable time gap.
• Before a psychological test is released for
general use – reliability should be checked.
3. Validity: the degree to which the test actually
measures what it suppose to measure and
how well it measures.
• Valid test: a test measures a trait that it
intends to measure well.
• The more reliable and valid the test, the
smaller the margin of error.
4. Norms: average performance of a
representative sample on a given test.
• Test must be guided by certain norms.
• Helpful in interpretation of scores.
5. Objectivity: test should be standardized.
• Purpose of the test should be clear in
objectivity.
• Objectivity should be considered as a
collection of standardization and psychometric
aspects (reliability, validity, norms).
• No personal opinion.
6. Sample of behavior: In psychological testing
we measure only some part of human
behavior.
• We cannot measure/ judge the complete
human behavior/ complete personality.
• Ex: Neo-PI (only Personality; not
Intelligence/aptitude).
Classification and uses of
psychological tests
• Individual test:
– Administered one at a time.
– Easy to study subject’s performance on the test
(R.T./ Motivation etc.).
• Group test:
– (mostly)paper-pencil tests.
– Large group of persons at the same time.
Main types of tests
1. Intelligence tests: Measures an individual’s ability in
relatively global areas (verbal comprehension,
perceptual organization, reasoning)- helpful in
determining potential for school work/ certain
occupations.
2. Aptitude tests: Measures the capability for a relatively
specific task/ skills which can be improve after a
training.
3. Achievement tests: measures a person’s degree of
learning, success or accomplishment in a subject/ task.
4. Creativity tests: Assess novel, original thinking and the
capacity to find unusual or expected solutions.
5. Personality tests: Measures the traits, qualities, behaviors that
determine a person’s uniqueness. (checklists, inventories,
projective techniques, sentence completions).
6. Interest Inventories: Measure an individual’s preference for certain
activities or topics- helpful in occupational choice (Ex: SVIB).
7. Behavioral procedures: Objectively describe and count the
frequency of a behavior, identifying the antecedents and
consequences of the behavior.
(Behavior Modification program: ABC Analysis- app.- School
settings & MR Group)
8. Neuropsychological tests: Measure cognitive, sensory, perceptual
and motor performance to determine the extent, locus and
behavioral consequences of brain damage.
• Ex: Helstead- Reitan & Luria Nebraska Neuropsychological test.
• Bender Visual Motor Gestalt Test.
Uses of Testing:
Decision making
Classification
– Placement
– Screening
– Certification
Diagnosis* & treatment (therapy) planning*
Self-knowledge
Research
– Pre—Intervention—Post
– Comparison
Personality
School Clinical
Psychology
Industrial Social
Developmental
General steps in test construction
• 7 imp. steps in test construction:
1. Planning of the test
2. Writing items of the test
3. Preliminary administration of the test.
4. Reliability of the final test.
5. Validity of the final test.
6. Preparing norms of the final test.
7. Preparation of the manual & reproduction of
the test.
1. Planning of the test: Blue print of the test.
• careful planning.
• Test constructor specifies the objectives
(broad & specific) of the test.
• Target group-nature of the content-type of
instructions- method of sampling- length &
time limit of the test-statistical methods.
2. Writing items of the test:
• An item is a single question/ task that is not often
broken down into smaller units (Bean, 1953).
• Descriptive or objective item- depends on the
purpose of the item.
• Creative art
(intuition/imagination/experience/practice).
• Through knowledge & mastery over the subject.
• Vocabulary; able to convey the meaning of the
items.
• Item analysis: items are reviewed by some
experts and then arranged in increasing order of
difficulty. (SPM)
3. Preliminary administration of the test:
• The test is ready after writing the items.
• At least 3 preliminary administrations of the test
(Conrad, 1951).
• 1st administration: to detect any gross defects,
ambiguities and omissions in items & instructions.
Sample size: 100+.
• 2nd administration: to provide data for item analysis.
Sample size: 400 (aaprox.)
• 3rd administration: to detect any minor defects that
may not have been detected by the first 2 p.a. items
are selected after item analysis and included in the
final test.
– Indicates: how effective the test will be when actually
administered.
4. Reliability of the final test:
• Consistency of the scores.
• When on the basis of the experimental try out
the test is finally composed of the selected
items, the final test is again administered on a
fresh sample in order to compute the
reliability coefficient.
• Sample size: 1000.
5. Validity of the final test: what the test is
supposed to measure & how well it measures.
• Valid test: a test measures a trait that it
intends to measure well.
• Validity should be computed from the samples
other than those used in item analysis.
6. Norms of the final test: average performance
or score of a representative of a sample on a
given test.
• When the scores are compared with the
norms, a meaningful inference can be
immediately drawn.
7. Preparation of the manual & reproduction of
the test:
• The test constructor reports the psychometric
properties of the test, norms and reference.
• Clear indications regarding the procedures of the
test administration, the scoring methods & time
limit of the test.
• Instructions & detailed arrangement of materials.
• Finally: printing of the test and the manual
(importance & requirement of the test).
Item analysis
• A good test has good items. An item is a single question/
task that is not often broken down into smaller units
(Bean, 1953).
• Is one of the most important aspects of test construction.
• An item analysis is a statistical method used to determine
the quality of a test by looking at each individual item or
question and determining if they are sound.
• It helps identify individual items or questions that are not
good questions and whether or not they should be
discarded, kept, or revised.
• A general term for a set of methods used to evaluate test
items.
• The basic methods involve assessment of
– item difficulty & item discrimination
• After the items have been written, reviewed and
carefully edited they are subjected to a procedure- ‘item
analysis’.
• I.A.: set of procedure that is applied to know the indices
for the truthfulness (validity) of items.
Objectives of I.A.:
1. I.A. indicates which items are difficult, easy, moderately
easy/ difficult. It provides an index of the difficulty
value to each item.
2. Provides indices of the ability of the item to discriminate
between inferior & superior. It gives discrimination value
of each item; known as item validity.
3. Indicates effectiveness of the distracters in MCQs.
4. At times, I.A. indicates why a particular item in the test
has not functioned effectively & how this can be
modified – functional significance can be increased.
2 types of item analysis:
Quantitative
Qualitative
– Item difficulty
– Item discrimination
Item difficulty is defined by the number of
people who get a particular item correct (index of
difficulty of an item).
• East item: 100 [ 90 (Right) & 10 (Wrong)].
• 100 [ 70 (R) & 30 (W)].
• 100 [ 50 (R) & 50 (W)].
• 100 [ 20 (R) & 80 (W)].
• The higher the proportion of people who get the
item correct, the easier the item.
• A good test must have some items of
– higher indices of difficulty [beginning],
– moderate indices of difficulty [middle],
– lower indices of difficulty [in the end]
• In most tests, the items should have a variety of difficulty
levels because a good test discriminates at many levels.
• Ex: a professor who wants to determine how much his
students have studied might like to discriminate
between Students
– Who have not studied at all and
– Those who have studied just a little, or
– Those who have studied just a little and
– Those who have studied a fair amount, or
– Those students who have studied more than average and
– Those who have worked and studied exceptionally hard.
• In other words, the professor needs to make many
discriminations. To accomplish this, he or she requires
items at many different levels of difficulty.
• For most tests, items in the difficulty range of
.30 to .70 tend to maximize information about
the differences among individuals.
• Some tests require a concentration of more-
difficult items (competitive exams).
• A test used to select students for educable
mentally challenged classes should have a greater
concentration of easier items to make fine
discriminations among individuals who ordinarily
do not perform well on tests (Allen & Yen, 1979).
• In constructing a good test, one must also consider
human factors.
• Ex: though items answered correctly by all
students will have poor psychometric qualities,
they may help the morale of the students who
take the test.
• A few easier items may help keep test anxiety in
check, which in turn adds to the reliability of the
test.
Item Discrimination: (item validity)
• Ability of an item on the basis of which the
discrimination is made between superiors &
inferiors [ Blood & Budd, 1972].
• Imp aspect of Item analysis.
Problems of Item analysis
1. Problem of spurious correlation in item-total correlation:
– Each item is a part of the total score.
– Extreme items should be included in the test.
2. Problem related to dichotomous items:
– Items cannot be selected on the basis of item-total
correlations.
3.Problem associated with control of unwanted factor:
– Items tend to correlate with unwanted factor/s.
– In NEO-PI (Personality+ verbal comprehension)
4. Problem associated with guessing/ chance success:
– Acute among items with too alternative options.
– Less in MCQs.
Issues in test administration
• Professional Issues:
– Theoretical concerns
– Adequacy of tests
• Moral Issues
– Human rights
a. Informed consent, purpose of testing,
b. Test scores & interpretation,
c. Who will have access to test data,
d. Right to confidentiality
– Labeling*
– Invasion of privacy (Psychologists must inform
subjects of the limits of confidentiality.)
– Divided loyalties
• Social Issues:
– Dehumanization
– Usefulness of tests
– Access to psychological testing services
• Competence of test purchasers
• Confidentiality
• Expertise in assessment*
• Communication of test results
• Responsible report writing
• Duty to warn [Tanya Tarasoff- Prosenjit Poddar case,
2002]
• https://en.wikipedia.org/wiki/Tarasoff_v._Regents_of_the_University_of_
California
• 2003: New revision of APA Ethical Principals of
Psychologists and code of conduct.
Some cultural issues:
• Impact of cultural background on test results
• Stereotype threat
• Assessment of cultural & linguistic minorities
– Native language interpreter
– Bilingual psychologist*
– Translation
Factors influencing test performance-
• 1. Examiner
– Establishing rapport
– Examiner sex, experience, race
• 2.Situational variables
– Laboratory settings
– Condition of the instruments/ tests
• 3. Test takers [subject] perspective
– Test anxiety
– Language
– Motivation/ tendency to deceive
– Alternatives [2/3/5/ open ended]
– Response style