Psychometrics is a field of study in psychology and education that focuses on the theory and
techniques of psychological measurement. It involves the design, development, and evaluation of
tests and assessments to measure psychological attributes such as knowledge, abilities, personality
traits, and more. Psychometrics aims to ensure that these tests are reliable, valid, and fair, so they
provide accurate and meaningful information about an individual's characteristics.
2 Aspects
Theoretical Aspect (What is Assessment):
This part is about the theory behind psychological tests. It looks at why and how we create tests to
measure things like intelligence or personality.
Think of it like the science behind making a good test.
Practical Aspect (Test Selection):
This part is about using these tests in real life. It's about choosing the right test for the right situation
and making sure the test is given and scored fairly.
Think of it like the practical side of using these tests, like picking the right tool for a job and using it
correctly.
What do tests measure?
Tests measure various aspects depending on their type, such as intelligence, personality, knowledge,
or skills.
What types of tests are there?
There are many types of tests, including intelligence tests, personality tests, academic tests, and
vocational tests.
How can I find a test?
You can find tests in academic libraries, online databases, or by consulting professionals in
psychology or education.
How should tests be administered and scored?
Tests should be administered following standardized procedures, and scoring should be done
consistently to ensure reliable results.
How should these scores be interpreted? For example, how can I tell whether a score of 20 on a
particular test is low, average, or above-average?
Scores are interpreted by comparing them to the norms or averages of the population the test is
designed for. A score of 20 can be assessed in the context of these norms.
How confident can I be that the test measures what it claims to measure? For example, do
people’s scores on a test of anxiety accurately reflect their true levels of anxiety and not something
else?
Confidence in a test's validity is established through rigorous research and testing. A well-validated
test measures what it claims to measure.
How much measurement error is associated with people’s scores on the test? If a test tells me that
someone has an IQ of 110, could their IQ really be as high as 130 or as low as 95?
Measurement error is present in all tests. So, an IQ score of 110 may have a margin of error, but it's
usually within a certain range.
Are the scores on this test influenced by other things? For example, if a personality test is given to
job applicants, will they try to distort their scores to get the job? Will anxiety affect performance
on a test used for academic selection?
Scores can be influenced by factors like social desirability in personality tests or test anxiety, so test
designers consider and control for these factors.
When searching for a test, how can I evaluate whether a particular test has been well developed?
You should consider factors like reliability, validity, and norming procedures to assess a test's quality.
How can I develop a test myself?
Developing a test requires expertise in test construction, rigorous research, and validation.
Are there any useful statistical tips and tricks which will help me interpret and use test scores?
Statistical techniques like factor analysis help identify underlying factors in test items, aiding
interpretation.
What is factor analysis and how is it used?
Factor analysis is a statistical method that identifies underlying factors in data. It's used to
understand the structure of test items.
How can I use test scores to predict behavior?
Test scores can be used to predict behavior through correlations and statistical analysis.
How is it possible to give people different items yet still be able to compare their scores?
Standardization allows comparison of scores from different test items.
Is it reasonable to add up people’s scores on various test items and interpret this total score?
It's reasonable for tests designed for that purpose, like intelligence tests.
Can anyone just think of a concept (e.g., 'egocentric optimism') and develop a test for it? Should
we develop tests which measure the important ways in which people really do differ, rather than
inventing more and more?
Developing a test should be based on rigorous research and real differences in people, rather than
inventing tests without clear purposes and research support.
WHO NEEDS TESTS?
Psychological tests and assessments are essential tools for various groups of individuals:-
1. Practitioners: These professionals, such as clinical, educational, occupational, sports, and forensic
psychologists, use tests for guidance and assessment in medical, clinical, or educational settings. For
example, they help diagnose conditions like dyslexia, behavioral problems, or cognitive decline and
evaluate preferred learning styles in students. Sports psychologists use tests to assess motivation and
self-efficacy, while health psychologists measure pain levels or understanding of medical information.
2. Occupational Psychologists: They use tests to select job applicants, especially when the
consequences of hiring the wrong person can be financially significant. For example, senior
executives might undergo assessments to determine their suitability for top positions.
3. Forensic Psychologists: They employ tests to assess an individual's fitness to plead, ability to
understand court proceedings, or the potential for rehabilitation in the case of dangerous prisoners.
4. Researchers and Students: This group may not necessarily be experts in individual differences but
uses tests in research. They measure variables related to psychology, genetics, health, or cognitive
abilities. Researchers examine the relationships between psychological traits and other variables, like
DNA, environmental factors, and cognitive performance.
5. Psychometricians: These are specialists with strong mathematical or statistical backgrounds who
focus on developing and analyzing methods of test data. They work on creating advanced
techniques, like adaptive testing.
6. Individuals Seeking Self-Knowledge: People who take self-administered tests for fun or self-
exploration, like personality quizzes found in books, magazines, or online, fall into this category.
These tests should be taken with caution, as their scientific basis is often unclear.
This book serves as a valuable resource for practitioners, researchers, and psychometricians, offering
insights into the design, administration, and interpretation of psychological tests and assessments. It
also covers newer developments and issues in psychometrics. For those seeking self-knowledge, it
emphasizes the importance of skepticism and caution when taking or interpreting self-administered
tests, especially when their scientific validity is questionable.
Psychology and Psychometrics
Importance of Theories: In psychology, it's crucial to have theories or ideas about why people are
different in terms of how they think, feel, and behave. For example, someone studying how our
mood affects our memory needs to understand different moods and memory processes based on
what's already known from previous research. It's not a good idea to create tests without looking at
what's already known because other experts won't take it seriously.
Relationship Between Psychology and Measurement: Sometimes, theories in psychology come from
personal experiences or observations rather than data analysis. Psychometrics, which is like a toolkit
of methods, helps design tests or questionnaires to measure things like emotions, intelligence, or
personality. It doesn't create the theories, but it helps measure them accurately.
Using Data for Theories: It's also possible to do the reverse - use psychometric methods to analyze a
wide range of behaviors and create theories from the data. This approach has led to theories about
human cognitive abilities, like intelligence. However, these methods alone can't fully explain why
these things happen - that's where detailed studies in labs come in.
First Step Towards Understanding: Psychometrics is just the starting point in psychology. It helps
identify interesting things to study. To truly understand why people behave the way they do, we need
to look at social, developmental, genetic, cognitive, and physical processes that influence our
behavior.
Not a Complete Textbook: This book doesn't explain all the theories in psychology. It's not a
textbook about why people are different. It focuses on the methods used to measure these
differences. If you want to learn more about the theories, you might want to read another book
alongside this one.
Doesn't Require Psychology Background: You don't need to be a psychology expert to understand
the methods in this book. The examples given are things you're likely familiar with, like anxiety or
intelligence, so anyone from different fields can follow the principles explained in this book.
Tests measure various things: Tests are used to measure all sorts of things, like how much
you like art, how much pain you can handle, your feelings, your personality, how well you
think, how you get along with others, and much more. There are lots of things you can
measure with tests.
Traits are a big deal: Traits are important in psychology. A trait is a thing that doesn't change much,
like being a bit anxious. If we test a bunch of people for their anxiety levels, the most anxious person
will still be very anxious if we test them again later. Situations can make you feel more anxious, but
your place among anxious people won't change much.
Two types of traits: There are two kinds of traits. One is about how you act or your personality, like
being jumpy if you're anxious. The other is about your ability to do things like solving puzzles or
reading people's emotions. For example, if you're good at puzzles, that's an ability trait.
You can measure attainments: This is about how good you are at things you've learned or practiced.
For example, how well you do in a sales job depends on your training, your experience, and how
motivated you are. This mix of things makes up attainments.
Tests for these are similar: Even though we're measuring different things like personality, abilities,
and attainments, the tests are made in a similar way. But remember, ability and attainment tests
directly test your skills, while personality tests are based on what you say about yourself.
States are temporary: States are things that don't last long. For example, you might feel really scared
if a car almost crashes into you. A few minutes later, you're back to your usual level of fear. States can
change because of things happening around you or in your body, or what you're thinking.
Two types of states: States come in two types: moods or emotions, and motivation. Your usual state
kind of matches your traits. So, if you're usually calm, you'll mostly be calm.
Measuring states: To measure states, people usually answer questions about how they feel right
now. But asking questions can change your mood, so it's a bit tricky.
Attitudes are important: Attitudes are like opinions or feelings about things. You can measure them
in different ways. You can see how people react when they listen to something they agree or
disagree with, or you can check physical things like their pupils getting bigger when they see
something they like.
Rating scales for attitudes: The easiest way to measure attitudes is by asking people to rate how
much they agree with statements. But this can have problems, like people wanting to look good, so
they might not be honest.
So, tests can measure lots of things, and there are different types of traits, states, and attitudes, each
measured in different ways.
Assumptions Made When Measuring Traits or States
Assumption 1: The Characteristic Might Not Exist
It's easy to create terms describing differences among people that aren't real.
For example, "Repression-Sensitisation" was believed to measure how people respond to threat, but
it was discovered to closely correlate with anxiety, suggesting it wasn't a new concept.
Assumption 2: Situations Influence Behavior
Categorizing people as anxious or relaxed depends on the situation.
Situations might determine how people behave; traits may not be consistent.
Evidence, such as questionnaire scores predicting real-life behaviors, suggests traits exist.
Assumption 3: Test Scores Don't Equal Measurement
Generating numbers from test scores isn't the same as measuring behavior.
Psychological characteristics might not be quantifiable.
Using numbers from psychological tests isn't like using measurements from physical instruments.
Interpretation of psychological data may differ from how we use data from devices like rulers or
thermometers.
These assumptions highlight challenges and complexities in psychological testing and measurement.
New Chapter: Tests, Scales, and Testing
Introduction to Psychometric Testing
Many people think of psychometric tests as providing deep insights into a person's inner self, but
they are simply standardized tools to gather information.
Standardization in Testing
Standardization means that anyone, anywhere taking the test has the same experience.
Everything from instructions, test environment, time limits, wording of questions, and scoring
methods must be standardized.
Scoring and Comparing Test Results
Tests come with standardized instructions for scoring, usually involving numerical scores.
These scores can be compared to other individuals' scores, correlated with other variables, or
compared with different test scores.
The Concept of Items
In testing, an "item" is a single question or statement.
Items are used to assess a domain of knowledge or a specific trait, like measuring historical
knowledge or depression.
Selecting items properly is essential to ensure a fair representation of the domain.
It's often more practical to use a subset of items to estimate a person's knowledge or trait.
Single-Item Scales
Sometimes, personality traits, mood, or other characteristics are measured using just one item.
This can be less reliable because a single question might not capture the complexity of a trait.
Single-item scales might be used when time is limited.
Multiple-Item Scales
Multiple-item scales consist of several items that measure the same trait or domain.
These items should all be related to the trait being measured.
A depression scale, for instance, might include items about low mood, changes in weight, sleep
quality, and energy levels.
Factor analysis is often used to ensure that items in a scale measure the same thing.
It's crucial to define the domain the scale measures to write relevant items.
How Long Should a Scale Be?
Determining the appropriate length of a scale is a fundamental question in psychometric testing. The
aim of any scale is to estimate a person's true score, which is the score they would get if they
answered all possible items in the domain. The length of a scale depends on its purpose and the
quality of the items.
Short vs. Long Scales
Longer scales are generally more accurate than shorter ones. For example, a 50-item scale is likely
to provide a more accurate estimate of a person's knowledge compared to a 10-item scale.
Longer scales reduce the impact of luck. If a person knows half of the material and takes a short 10-
item test, they could score perfectly by chance. In a longer 50-item test, this becomes less likely.
However, extremely long scales can lead to fatigue, boredom, and random answering, reducing the
test's accuracy.
Scaling and Correlations
The correlation between a scale's scores and the true score can be estimated, even if the true score
cannot be directly known because it would require testing on an infinite number of items.
Using two parallel scales with the same number of randomly selected items from the same large
domain can help estimate this correlation.
The length of a scale depends on its intended use. Scales used for critical decisions may need to be
very accurate and, therefore, longer. In other situations, a shorter scale may suffice.
Quality of Items
The quality of items in a scale significantly affects its length. Some scales contain items that
correlate strongly with the true score. In such cases, fewer items are needed for precision.
In real-life testing, the correlation between each item and the true score can be low, sometimes
around 0.3. In such cases, more items are needed to maintain the desired level of measurement
precision.
Types of Items in Ability or Attainment Testing
Free-Response Items
In free-response items, participants provide answers in their own words, either in writing, speaking,
or typing.
These items are less likely to lead participants to the correct answer, as there are no choices to pick
from.
Scoring can be challenging due to ambiguous or poorly presented answers.
They often require human scoring, which can be slow and costly.
Multiple-Choice Items
Most ability and attainment tests use multiple-choice items for easy scoring.
Participants select one correct answer from a list of options.
These items can sometimes result in correct answers through random guessing, which might affect
score interpretation.
The number of distractors (incorrect answer choices) affects item difficulty, but it's somewhat
arbitrary.
Multiple-choice items are not always suitable for assessing higher-order problem-solving skills.
In some cases, participants may randomly guess or omit answers, which can complicate scoring.
Guidelines should be provided to address guessing and omitted answers, and all participants should
be given the same instructions to maintain fairness.
Types of Items Measuring Personality, Mood, Motivation, and Attitude
In psychological assessment, there are various methods for measuring personality, mood,
motivation, and attitudes. The primary method used is the rating scale, which includes Likert scales.
This method involves respondents providing self-reported information. Historically, projective tests
and objective tests were used for such assessments, but they are now considered specialized and
less effective.
Likert Scales
Likert scales are the most widely used rating scales for measuring personality, mood, motivation, and
attitudes.
These scales usually consist of a stem, which can be a statement or question, and respondents are
asked to rate how well that statement describes them, how much they agree with it, or something
similar.
The scale typically has five or seven alternative answers, often arranged in order from strongly agree
to strongly disagree. The goal is to create ordered categories.
There's a "neutral" choice in the middle to allow respondents to express uncertainty.
Researchers generally treat responses as Q-data, which means that responses may not necessarily
reflect true feelings or behaviors but can still have diagnostic or predictive value.
Responses can sometimes be influenced by a person's self-knowledge, perception of themselves, and
individual differences in response patterns.
Some respondents may engage in random responding, which can be challenging to detect but can be
found using various techniques.
In rating oneself, people might use different reference groups to compare themselves to, which can
impact their responses.
Issues with Self-Report Questionnaires
Interpretation of Items: Respondents may interpret items differently. They may answer based on
their interpretation, which might not align with the researcher's intent. Researchers often treat
responses as Q-data, focusing on the patterns and associations rather than taking answers at face
value.
Comparison with Different Groups: Respondents may struggle to rate themselves without a specific
reference group. For example, "I am more intelligent than most people" could be interpreted
differently if the reference group isn't specified.
Overestimation of Abilities: People often overestimate their abilities or traits, a phenomenon known
as the "Lake Wobegon Effect" or "Illusory Superiority." This is not necessarily a problem for
assessment but needs to be considered.
Differences in Response Patterns: Different individuals may exhibit different response patterns, such
as agreeing with most statements or trying to present themselves in a positive light.
Random Responding: Some respondents might engage in random responding, which can be
challenging to detect without specific techniques.
Issues with Ratings of Behavior
Objective Ratings: Ratings from others can provide more objective results, eliminating self-delusion
or bias that individuals might exhibit when rating themselves.
Bias in Ratings: However, the raters themselves can introduce bias. They might intentionally rate
someone favorably or unfavorably for various reasons.
Limited Observation: Raters may have limited observation of an individual, which can result in an
incomplete picture of their behavior.
Halo Effect: There's the potential for a "halo effect" where rating someone highly on one trait leads
to high ratings on other traits, regardless of their actual behavior.
Perceived Associations: Raters may use questionnaires to show perceived associations between
traits, even if the traits don't genuinely correspond.
Deciding whether to use self-report scales or ratings by others depends on the specific context and
research goals. Researchers often consider the literature to determine which format is most suitable
for their study. Regardless of the method, it's essential to critically analyze responses and consider
potential biases, reference groups, and interpretation issues.