Psychometric
Prope ies
and
Principles
Presented by:
L e a r n i n g To p i c s :
• D i f f e r e n c e b e t w e e n P s yc h o l o g i c a l A s s e s s m e n t a n d
P s yc h o l o g i c a l Te s t i n g
• To o l s o f P s yc h o l o g i c a l A s s e s s m e n t
• P s yc h o m e t r i c P r o p e r t y : R e l i a b i l i t y
Learning Outcomes:
At the end of the lesson, learners will be able to:
a. distinguish between the concepts of psychological assessment and psychological
testing, highlighting their unique purposes and processes.
b. identify various psychological tools used in assessment and testing, providing
examples of di erent types.
c. explain the fundamental concept of reliability in psychological measurement and its
impo ance in ensuring the quality of assessment and testing.
d. dif fe rentiate between dif fe rent types of reliability (e.g., test-retest, internal
consistency, parallel forms) and describe how each is evaluated
through "Assessment in Action" case study analysis (group-based).
Psychological Testing
vs
Psychological Assessment
Psychological Testing
The process of measuring Psychology-
related variables by means of devices or
procedures designed to obtain a sample of
behavior.
-Numerical in nature
-Individual or by group
-Yield a test score or series of test score
Psychological assessment
It is the gathering and integration of
Psychology-related data for the purpose of
making a psychological evaluation that is
accomplished through the use of tools.
-Answers referral question
-Entails logical problem-solving that brings
to bear many sources of data assigned to
answer the referral question
Tools of Psychological
Assessment
psychological
•
test
device or procedure designed to measure variables related to
psychology
• Almost always involves analysis of a sample of behavior.
• The behavior sample could range from responses to a pencil-and
-paper task, questionnaire, to oral responses to questions related
to the performance of some.
Psychological tests and other tools of assessment may differ with
respect to a number of variables:
psychological
test
Content -The subject matter of the test.
Format -Form, plan, structure, arrangement, and layout of test
items as well as to related considerations.
Administration procedures -Demonstration of various kinds of
tasks demanded of the assessee, as well as trained observation of
an assessee's performance.
-For tests that are designed for administration to groups may not
require the test administrator to be present while the test takers
independently complete the required tasks.
psychological
test
Scoring and Interpretation procedures
• Scoring- The process of assigning such evaluative codes or
statements to performance on tests, tasks, interviews, or some
other behavior samples.
• Most tests of intelligence come with test manuals that are
explicit about scoring criteria and the nature of the interpretations
that can be made from the scores.
psychological
Psychometric soundness
test
- Refers to how consistently,
- how accurately a psychological test measures what it purports to
measure, and the -
-usefulness or practical value that a test or other tool of
assessment has for a particular purpose.
Examples of
psychological
test
1. Intelligence and Cognitive Abilities Tests: ese tests
assess intellectual potential, problem-solving skills,
memo , and other cognitive functions.
Examples of
intelligence
test
Wechsler Adult Intelligence Scale (WAIS): Measures general
intelligence in adults.
Wechsler Intelligence Scale for Children (WISC): Assesses cognitive
abilities in children.
Stanford-Binet Intelligence Scale: Another widely used measure of
intelligence across di erent age groups.
Raven's Progressive Matrices: A non-verbal test of abstract
reasoning.
Neuropsychological Tests: A broad catego assessing speci c
cognitive functions like memo , attention, language, and executive
functioning (e.g., Wisconsin Card So ing Test, Trail Making Test).
Examples of
psychological
test
2. Personality Tests: ese tests aim to describe patterns
of behavior, thoughts, and feelings.
Examples of
personality test
Objective Tests: Use structured response formats like multiple-choice
or true/false.
•
Minnesota Multiphasic Personality Invento (MMPI): A
comprehensive test used to screen for psychopathology.
•
NEO Personality Invento -Revised (NEO-PI-R): Assesses the "Big
Five" personality traits (Openness, Conscientiousness, Extraversion,
Agreeableness, Neuroticism).
•
Beck Depression Invento (BDI): Measures the severity of
depressive symptoms.
•
Beck Anxiety Invento (BAI): Assesses the severity of anxiety
symptoms.
Examples of
personality test
Projective Tests: Present ambiguous stimuli and ask for open-ended
responses, based on the idea that individuals will "project" their unconscious
thoughts and feelings.
Rorschach Inkblot Test: Individuals interpret a series of inkblots.
T h
ematic Apperception Test (TAT): Individuals tell stories about a series of
ambiguous pictures.
House-Tree-Person (HTP) Test: Individuals draw a house, a tree, and a
person, and then answer questions about their drawings.
Sentence Completion Tests: Projective tests where individuals complete
incomplete sentences, providing insights into their thoughts, feelings, and
attitudes.
Examples of
psychological
test
3 . A c h i e ve m e n t Te st s : T h
e s e t e st s m e a s u re a n
individual's knowledge and skills in a pa icular academic
or skill area.
Examples of
achievement
•
test
We chs l e r I nd ivid u al Achie ve me nt Te st ( WI AT) : As s e s s e s acad e mic
achievement in areas like reading, writing, and math.
•
Kaufman Test of Educational Achievement (KTEA): Another measure of
academic achievement.
•
Wide Range Achievement Test (WRAT): A brief achievement test that
assesses basic academic skills in reading, spelling, and arithmetic.
•
Graduate Record Examinations (GRE) Subject Tests: Standardized tests that
measure knowledge and understanding in specif ic academic disciplines at
the graduate level (e.g., Psychology, Literature in English).
Examples of
psychological
test
4. Aptitude Tests: T h
ese tests are designed to predict an
individual's potential for future learning or success in a
speci c area.
• Dif ferential Aptitude Tests (DAT): Assesses a range of aptitudes relevant to
educational and vocational success.
• Armed Ser v
i ces Vocational Aptitude Batter y(ASVAB): Used for milita
recruitment and career exploration.
• Musical Aptitude Tests (e.g., Seashore Measures of Musical Talents): Assess
an individual's potential for musical achievement by evaluating abilities like
pitch discrimination, tonal memo , and rhythm.
Examples of
psychological
test
5. Interest Inventories: T h
e se tools help individuals
identify their interests and preferences, often for career
exploration.
• Strong Interest Inventor y
: Compares an individual's interests to those of
people successful in various occupations.
• Kuder Career Search: Assesses interests across di erent vocational areas.
• Self-Directed Search (SDS): self-scored inventor ythat helps individuals
explore career options based on their interests and abilities, categorized into
Holland's six occupational themes (Realistic, Investigative, Ar tistic, Social,
Enterprising, Conventional).
• O*NET Interest Prof iler: A free online tool that helps individuals identify their
work-related interests and explore potential career paths aligned with those
Examples of
psychological
test
6. Attitude Scales: ese measures assess an individual's
feelings or opinions towards a specif ic topic, person, or
object.
• Like Scale: Presents statements and asks respondents to indicate their
degree of agreement or disagreement.
• urstone Scale: Uses a series of statements with assigned numerical values
indicating the intensity of the attitude.
• Semantic Di erential Scale: Measures the connotative meaning of concepts
by asking individuals to rate them on a series of bipolar adjective scales.
Examples of
psychological
test
Inte iew
• method of gathering information through direct
communication involving reciprocal exchange
• if the interview is conducted face-to-face, the
in t e r v ie we r is t ak in g n ot e of b ot h ve rb al an d
nonverbal behavior (behavioral observation).
po folio
-Samples of one's ability and accomplishment.
Case histo data
Refers to records, transcripts, and other accounts in
written, pictorial, or other forms that preserve archival
information, of ficial and informal accounts, and other
data and items relevant to an assessee.
Behavioral
Obse ation
Monitoring the actions of others or
on e se lf by v isu al or e le ct ron ic
means while recording quantitative
an d /or qu alit at ive in f ormat ion
regarding those actions.
psychometric
soundness
reliability
reliability
• Dependability or consistency of the instrument or scores obtained by
the same person when re-examined with the same test on different
occasions, or with different sets of equivalent items
• Test may be reliable in one context, but unreliable in another
• More number of items = higher reliability
• Using only representative sample to obtain an observed score
• True score cannot be found.
reliability
• ·Reliability Coef fic ient: index of reliability, a proportion that indicates
the ratio between the true score variance on a test and the total variance.
• · Variance - A statistic useful in describing sources of test score
variability. This statistic can be broken into components:
True variance- Variance from true differences
Error variance- Variance from irrelevant random sources
Total variance= True variance + Error variance
• Reliability refers to the proportion of the total variance attributed to true
variance.
reliability
reliability
• Goals of Reliability
✓ Estimate errors
✓ Devise techniques to improve testing and reduce errors.
reliability
• Measurement Error
-Refers to all of the factors associated with the process of measuring some
variable other than the variable being measured.
Random error -A source of error in measuring a targeted variable caused
by unpredictable f luctuations and inconsistencies of other variables in the
measurement process.
Systematic error -A source of error in measuring a variable that is typically
constant or proportionate to what is presumed to be the true value of the
variable being measured.
reliability
Classical Test T h
eor y(True Score T h ) – score on an ability tests is
eor y
presumed to ref le ct not only the testtaker’s true score on the ability being
measured but also the error.
Error refers to the component of the observed test score that does not have
to do with the testtaker’s ability.
X = T +E
X = observed score
T = true score
E = error
Test-Retest
Reliability
• time sampling reliability
• an estimate of reliability obtained by correlating pairs of scores
from the same people on two different administrations of the
test
• appropriate when evaluating the reliability of a test that purports
to measure an enduring and stable attribute such as personality
traits
• the passage of time can be a source of error variance.
Test-Retest
Reliability
• Carr y
over Ef fe cts happened when the test-retest interval is
short, wherein the second test is inf lu enced by the f ir st test
because they remember or practiced the previous test --
Practice E ect: scores on the second session are higher due to
their experience of the first session of testing
• test-retest with longer interval might be affected of other
extreme factors, thus, resulting to low correlation
• lower correlation = poor reliability
Test-Retest
Reliability
• statistical tool: Pearson R and Spearman Rho
Parallel Forms/Alternate
Forms Reliability
• established when at least two different versions of the test yield
almost the same scores
• Parallel Forms : each form of the test, the MEANS, and the
VA R I A N C E S , a r e E Q UA L ; s a m e i t e m s , d i f fe r e n t
positionings/numberings
• Alternate Forms: simply different version of a test that has
been constructed to be parallel
• Test should contain the same number of items and the items
should be expressed in the same form and should cover the
same type of content; range and difficulty must also be equal.
Parallel Forms/Alternate
Forms Reliability
• Counterbalancing: technique to avoid carryover effects for
parallel forms, by using different sequence for groups
• can be administered on the same day or different time
• most rigorous and burdensome, since test developers create
two forms of the test
• main problem: difference between the two tests
• Statistical Tool: Pearson R or Spearman Rho
Internal Consistency
• Inter-item Reliability
• used when tests are administered once
• consistency among items within the test
• measures the internal consistency of the test which is the
degree to which each item measures the same construct
• measurement for unstable traits
Internal Consistency
KR-20: used for inter-item consistency of dichotomous items
unequal variances, dichotomous scored primarily those items
that can be scored right or wrong.
KR-21: if all the items have the same degree of dif ficulty, equal
variances, dichotomous scored
Cronbach’s Coef ficient Alpha: used when two halves of the test
have unequal variances and on tests containing non-
dichotomous items
Split-Half Reliability
• obtained by correlating two pairs of scores obtained from
equivalent halves of a single test administered ONCE
• useful when it is impractical or undesirable to assess
reliability with two tests or to administer a test twice
• randomly assign items or assign odd-numbered items to one
half and even-numbered items to the other half
Split-Half Reliability
• Spearman-Brown Formula: allows a test developer of user
to estimate internal consistency reliability from a correlation
of two halves of a test, if each half had been the length of the
whole test and have the equal variances
• Spearman-Brown Prophecy Formula: estimates how many
more items are needed in order to achieve the target
reliability
• Rulon’s Formula: ratio of the variance of difference between
the odd and even splits and the variance of the total,
combined odd-even scores
Split-Half Reliability
• if the reliability of the original test is relatively low, then
developer could create new items, clarify test instructions, or
simplifying the scoring rules
• equal variances, dichotomous scored
• Statistical Tool: Pearson R or Spearman Rho
Inter-Scorer
Reliability
• the degree of agreement or consistency between two or
more scorers with regard to a particular measure
• Cohen’s Kappa: two raters only
• Fleiss Kappa : determine the level between TWO or MORE
raters when the method of assessment is measured on
C AT E G O R I C A L S C A L E , I n t r a c l a s s C o r r e l a t i o n
Coe cient (ICC) for CONTINUOUS SCALE
• Krip p end or f ’s A l p ha : two or more ra ter s , ba s ed on
observed disagreement relative to disagreement expected by
chance
psychometric
soundness
Validity
Validity
Validity- a judgment or estimate of how well a test measures what it supposed to measure
Validation – the process of gathering and evaluating evidence about validity
Validation Studies – yield insights regarding a particular population of test takers as
compared to the norming sample described in a test manual
Internal Validity – degree of control among variables in the study (increased through random
assignment)
External Validity – generalizability of the research results (increased through random
selection)
Conceptual Validity – focuses on individual with their unique histories and behaviors
Di erent Forms of Validity
Content Validity
- describes a judgement of how adequately a test samples behavior representative of the
universe of behavior that the test was designed to sample
- when the proportion of the material covered by the test approximates the proportion of
material covered in the course
- Test Blueprint: a plan regarding the types of information to be covered by the items, the no.
of items tapping each area of coverage, the organization of the items, and so forth
Di erent Forms of Validity
Criterion validity
- is a type of validity that evaluates how well a test predicts or correlates with a specif ic
outcome (criterion).
Two Types of Criterion Validity
Concurrent Validity
Test results are compared with a criterion measured at the same time.
Example: A new depression scale is compared with an already
established clinical diagnosis or existing test like the BDI-II.
Predictive Validity
Test results are used to predict a future outcome.
Example: SAT scores predicting rst-year college GPA.
Test Criterion Type
Aptitude test for mechanics On-the-job pe ormance rating Predictive validity
New anxiety invento Clinical inte iew result Concurrent validity
College entrance exam First-year GPA Predictive validity
Depression scale DSM-5-based diagnosis Concurrent validity
Di erent Forms of Validity
Construct Validity (Umbrella Validity)
- covers all types of validity
refers to how well a test truly measures the theoretical construct (mental trait or
concept) it claims to measure.
Di erent Forms of Validity
How Is Construct Validity Established?
1. Convergent Validity:
The test correlates highly with other tests that measure the same or
similar constructs.
��
Example: A new self-esteem scale should correlate well with
Rosenberg’s Self-Esteem Scale.
2. Discriminant Validity (aka Divergent Validity):
The test does not strongly correlate with tests of unrelated constructs.
��
Example: A test for social anxiety should not correlate too highly
with a math aptitude test.
Di erent Forms of Validity
3. Factor Analysis:
Used to check whether the test items group together as expected.
��
Example: All items measuring “impulsivity” should statistically
cluster together.
4. Hypothesis Testing:
Test behaves as expected in experimental or real-life conditions.
��
Example: People with higher trait anger score higher on an anger
scale.
psychometric
soundness
Utility
Di erent Forms of Validity
Utility – usefulness or practical value of testing to
improve efficiency
- Can tell us something about the practical value of the information
derived from scores on the test.
- Helps us make better decision
- Higher criterion-related validity = higher utility
- One of the most basic elements in utility analysis is financial cost of
the selection device
- Cost – disadvantages, losses, or expenses both economic and
noneconomic terms
- Benefit – profits, gains or advantages
ank
You
ASSESSMENT IN ACTION (case study analysis)
Divide the class into 5 groups. Each group will be provided with a shor t, relatable case study
scenario set in a Cebuano context that they need to analyze and discuss. Each scenario involves
elements of both psychological assessment and testing, potentially highlighting a psychological tool,
and depicting psychometric prope y pa icularly reliability.
G1: Maria, a bright Grade 10 student in a public high school in Cebu City, is applying for a specialized
STEM (Science, Technology, Engineering, and Mathematics) track for senior high. As par tof the
application process, she takes a standardized aptitude test focusing on math and logical reasoning.
Her score is just below the cut-of f. However, her teachers describe her as highly motivated and
consistently achieving high grades in her science and math classes. T h e guidance counselor also
notes that during their individual inter vi ew, Maria expressed signif ic ant anxiety about taking
standardized tests.
G2: A Barangay Health Worker (BHW) in a rural area of Cebu Province is tasked with conducting initial
screenings for depression among adults in the community as par tof a mental health awareness
program. T h ey use a shor t, translated Cebuano version of a standardized depression scale that relies
on self-repor te d symptoms. Some residents are hesitant to answer the questions openly due to
cultural stigma surrounding mental health, and others may have dif ficulty understanding some of the
G3: A large call center in Cebu City is using a personality questionnaire as par tof its hiring process for
c u st o m e r s e r vic e a g e n t s . T h
e q u e st i o n na i re a i m s t o a s s e s s t ra i t s l i ke a g re e a b l e n e s s ,
conscientiousness, and emotional stability. One applicant, Carlo, scores ver ylow on "Agreeableness"
on the test. However, during his behavioral inter v i ew, he provided specif ic examples of how he
ef fe ctively resolved customer complaints with empathy and patience in his previous job. T h e HR
manager is now unsure how much weight to give the test score versus the inte iew pe ormance.
G4: A child development specialist is obser v
ing preschoolers in a daycare center in Mandaue City to
assess their social interaction skills. She uses a checklist to record the frequency of specif ic behaviors
like sharing, cooperation, and conf lict. However, she notices that the children behave dif ferently when
they know they are being watched, and her own interpretation of cer ta in behaviors might be
inf lu enced by her personal biases. Another obser v e r, who is also using the checklist, sometimes
records di erent behaviors for the same child at the same time.
G5: After a recent typhoon af fected a coastal barangay in Cebu, the local government unit wants to
assess the residents' psychological well-being and coping mechanisms. ey distribute a sho su ey
with questions about stress levels, sleep disturbances, and feelings of safety. T h e sur ve y is
administered by dif ferent volunteers, some of whom may not have received standardized training on
how to explain the questions or handle residents' emotional responses. Some residents may also be
Discussion Points for Each Group:
Identify the instances of psychological testing within the scenario.
What speci c tools might be used?
Identify the instances of the broader psychological assessment
process. What other information is being considered beyond test
scores?
What psychological tools are either mentioned or could be
appropriately used in this situation?
Discuss the impo ance of reliability for any tools used in this
scenario. What could threaten the reliability of these tools in this
speci c context (e.g., language barriers, cultural interpretations)?
Present the analysis to the class.
Presentation of Case Study
Analysis
ank
You
Presented by:
Joshua R. Lelis