KEMBAR78
Notes of Psychometrics | PDF | Standard Error | Errors And Residuals
0% found this document useful (0 votes)
68 views20 pages

Notes of Psychometrics

The document provides an overview of psychometrics, focusing on the concepts of measurement, evaluation, and psychological assessment. It discusses the importance of reliability, validity, and various types of psychological tests, including intelligence and personality tests. Additionally, it highlights the limitations of psychological testing and suggests ways to mitigate these limitations for more accurate assessments.

Uploaded by

Yash Dilip
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views20 pages

Notes of Psychometrics

The document provides an overview of psychometrics, focusing on the concepts of measurement, evaluation, and psychological assessment. It discusses the importance of reliability, validity, and various types of psychological tests, including intelligence and personality tests. Additionally, it highlights the limitations of psychological testing and suggests ways to mitigate these limitations for more accurate assessments.

Uploaded by

Yash Dilip
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Basics of psychometrics

Scales formulation
Storytelling, workshop conduction
Narrative therapy
Addiction work
Past experience working with children and adolescent and
making modules current work

NOTES OF PSYCHOMETRICS

MODULE 1
Test is usually considered the narrowest of the four terms; it connotes the
presentation of a standard set of questions to be answered. As a result of a
person's answers to such a series of questions, we obtain a measure of a
characteristic of that person.
Measurement often connotes a broader concept: We can measure
characteristics in ways other than by giving tests. Using observations, rating
scales, or any other device that allows us to obtain information in a quantitative
form is measurement. Also, measurement can refer to both the score obtained,
and the process used.
Evaluation: Stufflebeam et al. (1971) stated that evaluation is "the process of
delineating, obtaining, and providing useful information for judging decision
alternatives." A second popular concept of evaluation interprets it as the
determination of the congruence between performance and objectives. Other
definitions simply categorize evaluation as professional judgment or as a
process that allows one to make a judgment about the desirability or value of
something. One can evaluate with either qualitative or quantitative data.
Psychological assessment is a flexible, not standardized, process aimed at
reaching a defensible determination concerning one or more psychological
issues or questions, through the collection, evaluation, and analysis of data
appropriate to the purpose at hand (Maloney & Ward, 1976).
Psychometric testing
• Psycho + Metry (Greek, Metria – a measuring of)
• Testing consists of the administration of one or more standardized procedures under
particular environmental conditions (e.g., quiet, good lighting) in order to obtain a
representative sample of behaviour

Psychometric testing is a type of assessment that is used to measure psychological


characteristics such as aptitude, personality, intelligence, and other abilities. These tests are
often used in the fields of psychology, education, and employment to assess an individual's
suitability for a particular job or role.
Psychometric tests are typically administered by trained professionals such as psychologists
or HR managers. They are usually administered in a standardized format, with questions or
tasks that are designed to measure specific psychological constructs. The results of these tests
can be used to predict an individual's potential performance in a particular role or to identify
areas of strength and weakness.
There are many different types of psychometric tests, including aptitude tests, personality
tests, intelligence tests, and skills tests. Aptitude tests are designed to measure an individual's
potential to learn new skills or acquire knowledge in a particular area. Personality tests are
used to assess an individual's personality traits, such as their level of extroversion,
agreeableness, and conscientiousness. Intelligence tests are designed to measure an
individual's overall cognitive ability, while skills tests are used to assess an individual's
proficiency in specific areas such as mathematics or verbal communication.
Psychometric testing is a widely used and valuable tool for assessing psychological
characteristics and predicting future performance. However, it is important to note that these
tests are not always completely accurate and should be used as part of a holistic evaluation
process.
What is measurement in psychological testing?
Psychological testing refers to the use of standardized procedures and instruments to measure
psychological constructs, such as intelligence, personality, and aptitude. Psychological tests
are administered by trained professionals, such as psychologists, and are often used to
diagnose mental health conditions, assess cognitive functioning, or evaluate personality traits.
In psychological testing, measurement refers to the process of quantifying the psychological
construct being tested. This typically involves assigning scores to test takers based on their
responses to the test items. The scores are then used to compare the test takers to a normative
sample, which is a group of people who have taken the test under similar conditions. The
normative sample provides a reference point for interpreting the test scores and determining
how the test taker compares to others.
Psychological tests are designed to be reliable, which means that they produce consistent
results over time and across different administrations. They are also designed to be valid,
which means that they measure what they are intended to measure. The reliability and
validity of psychological tests are carefully evaluated during the development and testing
process to ensure that they produce accurate and meaningful results.

There are several properties of measurement that are important to consider when evaluating
the quality of a measurement instrument (e.g. a test, survey, or other assessment tool). These
properties include:
1. Reliability: This refers to the consistency of the measurement. A reliable
measurement produces consistent results when an individual is measured multiple
times under similar conditions.
2. Validity: This refers to the accuracy of the measurement. A valid measurement
accurately measures what it is intended to measure (e.g. intelligence, aptitude,
personality, etc.).
3. Sensitivity: This refers to the ability of the measurement to detect changes or
differences in the characteristic being measured. A sensitive measurement is able to
detect small changes or differences in the characteristic over time.
4. Specificity: This refers to the ability of the measurement to distinguish between
different characteristics or traits. A specific measurement is able to accurately
measure one trait without being influenced by other traits.
5. Practicality: This refers to the feasibility of administering the measurement in a real-
world setting. A practical measurement is easy to administer and does not require a lot
of time or resources.
6. Objectivity: This refers to the lack of bias in the measurement. An objective
measurement is unbiased and does not favor one individual or group over another.
Understanding these properties of measurement is important for evaluating the quality of a
measurement instrument and ensuring that it is a reliable and valid tool for assessing the
characteristics or traits of individuals.

Importance of measurement in psychological testing:


Measurement is an important aspect of psychological testing because it allows researchers
and practitioners to quantitatively assess psychological constructs and compare individuals to
each other and to a normative sample. By providing numerical scores, psychological tests
allow for objective and standardized comparisons of psychological characteristics.
In addition, psychological testing can provide important insights into people's cognitive
abilities, personality traits, and mental health. For example, intelligence tests can help
identify strengths and weaknesses in cognitive functioning and guide educational and career
decisions. Personality tests can help individuals better understand their own personality and
how it may affect their relationships and behavior. Mental health assessments can help
diagnose mental health conditions and guide treatment recommendations.
Overall, the importance of measuring in psychological testing lies in its ability to provide
objective and standardized data on psychological constructs, which can inform decision
making, treatment planning, and research.

There are several types of measurement in psychological testing, including:


1. Nominal measurement: This type of measurement involves assigning categories or
labels to test takers without any inherent order or ranking. For example, a test that
asks test takers to identify their gender would be using nominal measurement.
2. Ordinal measurement: This type of measurement involves assigning rankings or order
to test takers, but without specifying the intervals between the rankings. For example,
a test that asks test takers to rate their satisfaction on a scale from 1 to 5 would be
using ordinal measurement.
3. Interval measurement: This type of measurement involves assigning equal intervals
between the scores, but without a true zero point. For example, a test that measures
temperature in degrees Celsius would be using interval measurement.
4. Ratio measurement: This type of measurement involves assigning equal intervals
between the scores and a true zero point. For example, a test that measures weight in
kilograms would be using ratio measurement.
Which type of measurement is used in a psychological test depends on the nature of the
construct being measured and the purpose of the test. Nominal and ordinal measurement are
often used for categorical data, such as demographic characteristics or responses to open-
ended questions. Interval and ratio measurement are often used for continuous data, such as
scores on tests of cognitive ability or personality.
Psychological measurement refers to the use of tests, questionnaires, and other assessment
tools to quantify and evaluate psychological constructs such as personality, intelligence, and
emotions. However, there are several issues that can arise in the use of these measures:
1. Reliability: Psychological measures should be reliable, meaning that they should
produce consistent results over time and across different administrations. However,
some measures may be prone to errors or inconsistencies, which can affect their
reliability.
2. Validity: Psychological measures should also be valid, meaning that they should
accurately measure the psychological construct they are intended to assess. However,
some measures may be invalid or may not adequately capture the complexity of the
construct they are intended to assess.
3. Bias: Psychological measures can be biased in a number of ways, including cultural
bias (e.g., measures that are more appropriate for one cultural group may be less
appropriate for another), gender bias (e.g., measures that are more appropriate for one
gender may be less appropriate for the other), and age bias (e.g., measures that are
more appropriate for one age group may be less appropriate for another).
4. Test-taking ability: Some individuals may be more skilled at taking psychological
tests than others, which can affect their scores. This can be particularly problematic if
the measure is being used to make important decisions (e.g., hiring or admission
decisions).
5. Inappropriate use: Psychological measures should be used appropriately, meaning that
they should be administered and interpreted by trained professionals in accordance
with their intended purpose. However, there is a risk that measures may be misused or
misinterpreted, which can lead to inaccurate or unfair conclusions.

TESTS AND THEIR FUNCTIONS:


Psychological tests are tools used to assess a person's mental abilities, emotions, and
personality traits. They can be used to diagnose mental health conditions, assess cognitive
functioning, and evaluate personality characteristics.
Some common types of psychological tests include:
 Intelligence tests: These tests measure cognitive abilities such as problem-solving,
verbal, and mathematical aptitude, and spatial reasoning.
 Achievement tests: These tests measure a person's knowledge and skills in a particular
area, such as reading or mathematics.
 Personality tests: These tests assess a person's feelings, thoughts, and behaviors. They
can include self-report questionnaires and structured interviews.
 Neuropsychological tests: These tests evaluate how well the brain is functioning by
assessing cognitive skills such as memory, attention, and problem-solving.
 Projective tests: These tests assess unconscious thoughts and feelings by presenting
the test taker with ambiguous stimuli, such as inkblots or pictures, and asking them to
interpret or describe them.
Overall, psychological tests can be useful in a variety of settings, including clinical,
educational, and occupational settings. They can help identify mental health concerns, assess
cognitive abilities, and provide insight into personality traits, among other things.

Standard error in measurement is a measure of the variability or dispersion of a set


of measurements. It is calculated as the standard deviation of the sampling distribution of the
mean.
The standard error is a measure of how accurately the sample mean represents the population
mean. It is important to understand the standard error because it tells us how much
uncertainty there is in our estimate of the mean. A small standard error indicates that the
sample mean is a good estimate of the population mean, while a large standard error indicates
that the sample mean is a poor estimate of the population mean.
To calculate the standard error of the mean, you first need to calculate the standard deviation
of the sample. Then, divide the standard deviation by the square root of the sample size.
For example, suppose you have a sample of n=10 measurements with a mean of x̄ =5 and a
standard deviation of s=2. The standard error of the mean would be:
SE = s/√n = 2/√10 = 0.63
This tells us that the sample mean is likely to be within 0.63 of the population mean 95% of
the time.
It is important to note that the standard error is only a measure of the precision of the sample
mean and does not reflect the accuracy of the measurements themselves. If the measurements
are accurate but the sample is not representative of the population, the standard error will still
be large. On the other hand, if the sample is representative of the population but the
measurements are not accurate, the standard error will also be large.

OR
In psychological testing, the standard error of measurement (SEM) is a measure of the
variability or dispersion of a person's test scores. It is calculated as the standard deviation of
the sampling distribution of the test scores.
The SEM is used to estimate the precision or reliability of a person's test scores. It tells us
how much uncertainty there is in our estimate of a person's true score (i.e., their score on the
test if they took it an infinite number of times). A small SEM indicates that the test scores are
reliable and precise, while a large SEM indicates that the test scores are less reliable and less
precise.
The SEM is typically used to provide a confidence interval around a person's test score. For
example, if a person's test score is x̄ =50 with an SEM of 2, we can be 95% confident that
their true score falls within the interval (x̄ - 2 SEM, x̄ + 2 SEM), or (46, 54). This means that
if the person took the test an infinite number of times, their average score would fall within
this range 95% of the time.
It is important to note that the SEM is only a measure of the precision of the test scores and
does not reflect the validity of the test. A test may have high reliability (i.e., low SEM) but
still not be a valid measure of the construct it is intended to assess.

Difference between measurement and assessment


Measurement and assessment are two related but distinct concepts in psychological testing.
Measurement refers to the process of assigning numbers or scores to characteristics or
qualities in order to quantify them. In psychological testing, measurement is used to assign
scores to an individual's abilities, personality traits, and other psychological characteristics.
Assessment, on the other hand, refers to the process of evaluating an individual's abilities,
characteristics, or performance. In psychological testing, assessment involves the use of
measurements (such as scores on a test) to draw conclusions about an individual's
psychological characteristics or potential for success in a particular role or task.

Measurement
Precise, Quantitative Value
To observe or determine magnitude of variate.
Process of assigning symbols to dimensions of phenomenon in order to characterize the status
of the phenomen on as precisely as possible.
Used in a narrow sense.

ASSESSMENT
Subjective Judgement.
Evaluation or appraisal
The assignment of symbols to phenomena in order to characterize the worth or value of a
phenomenon, usually with reference to some social, cultural or scientific standard.
Used in a broader sense.

Limitations of Psychological testing:


1. Psychological tests shouldn’t be used without careful guidance and consideration.

2. Careful administration of the test needs to be ensured.


3. Tests should be administered and scored by qualified examiners.

4. Standard set of administration and scoring procedures should be in place to avoid

subjectivity and/or experimental error.

5. Prevention of general familiarity with the test content, which would otherwise

invalidate the test.

6. Appropriate selection of the test to be used for a given purpose.

7. Rationale for test use and the expected application of test results should be clear, if

not, then test scores are not likely to be of much use or are likely to be misused.

8. Test-taking ability: A person's test-taking ability, such as their motivation or ability

to focus, can affect the accuracy of the results.

Limited scope: Psychological tests can only assess a limited range of mental abilities and
characteristics. They may not provide a complete picture of a person's psychological
functioning.
Subjectivity: The interpretation of test results can be subjective, as it often involves the
judgment of the person administering the test.
Ethical considerations: There are ethical considerations involved in psychological testing,
such as obtaining informed consent and protecting the privacy of test takers.
Bias: Psychological tests may be biased towards certain groups of people, leading to
potentially inaccurate or unfair results.
Cultural differences: Psychological tests may not be equally applicable or appropriate for all
cultural groups and may not accurately assess certain cultural values or beliefs.
Time and resources: Psychological testing can be time-consuming and resource-intensive
and may not always be feasible or practical in certain situations.

There are several ways to mitigate the limitations of psychological testing:


1. Use multiple measures: Psychological tests should be used in conjunction with other
methods, such as interviews, observations, and self-report measures, to get a more
comprehensive understanding of an individual's characteristics.
2. Use standardized tests: Standardized tests are designed to be administered and scored
in a consistent manner. This helps to reduce bias and increase the reliability of the
results.
3. Use valid and reliable tests: It's important to use tests that have been demonstrated to
be valid (i.e., measuring what they are intended to measure) and reliable (i.e.,
producing consistent results).
4. Consider the context: Psychological tests should be administered in a controlled
setting, such as a testing room, to minimize distractions and ensure that test-takers are
focused.
5. Provide appropriate test administration and scoring: Test administrators should be
trained and follow established guidelines for administering and scoring tests.
6. Consider test-taker characteristics: Test results may be influenced by a variety of
factors, such as test-takers' motivation, level of anxiety, or cultural background. It's
important to consider these factors when interpreting test results.
7. Use caution when making decisions based on test results: Test results should not be
used as the sole basis for making decisions about individuals. It's important to
consider other relevant information and to use test results in conjunction with other
methods of assessment.

The expectancy effect, also known as the Pygmalion effect or self-fulfilling prophecy, refers
to the idea that a person's expectations about an individual can influence that person's
behavior and ultimately their outcomes. In the context of psychological testing, the
expectancy effect can occur when the test administrator has certain expectations about an
individual based on their demographic characteristics or other information. These
expectations can influence the way the test is administered and scored, leading to biased or
inaccurate results.
For example, if a test administrator expects a certain test-taker to perform poorly based on
their race or socioeconomic status, they may unconsciously communicate this expectation
through their body language or the way they administer the test. This may cause the test-taker
to become anxious or self-conscious, leading to poor performance on the test.
To mitigate the effects of the expectancy effect in psychological testing, it's important for test
administrators to be aware of their own biases and to make an effort to be objective and
unbiased when administering and scoring tests. This can involve using standardized
administration and scoring procedures and avoiding making assumptions about test-takers
based on demographic characteristics or other information.
9. Advance Preparation
Rapport

Rapport is an important aspect of psychological testing because it can impact the validity and
reliability of the test results. Establishing rapport with the test taker can help to reduce
anxiety and increase their willingness to fully participate in the testing process, which can
result in more accurate test results.

Good rapport can also make the test taker feel more comfortable and at ease, which can
improve their overall experience of the testing process. This can be particularly important
when working with children or individuals who may be nervous or anxious about being
tested.

Additionally, rapport can help to build trust and establish a positive therapeutic relationship
between the test administrator and the test taker. This can be beneficial for individuals who
may be resistant or hesitant to participate in testing due to past negative experiences.

Overall, establishing rapport is an important step in the psychological testing process that can
help to improve the accuracy and usefulness of the test results.

There are several ethical considerations that must be taken into account when
conducting psychological testing. These include:

1. Informed consent: Test takers should be fully informed about the nature and purpose
of the testing, and should provide their informed consent before proceeding.

2. Confidentiality: Test results and other personal information should be kept


confidential and only shared with those who have a legitimate need to know.

3. Fairness and non-discrimination: Psychological tests should be administered in a fair


and unbiased manner and should not discriminate against any particular group of
individuals.

4. Competence: Test administrators should be properly trained and qualified to


administer the test and should ensure that they are using the test in a way that is
appropriate and consistent with best practices.
5. Use of test results: Test results should be used for the purpose for which they were
intended and should not be used to make decisions that could have negative
consequences for the test taker. AND rights to results

6. Professional responsibility: Test administrators have a professional responsibility to


adhere to ethical standards and to act in the best interests of the test taker.

By following these ethical principles, test administrators can help to ensure that
psychological testing is conducted in a way that is respectful, fair, and beneficial to the test
taker.

Dehumanisation in psychological testing

Dehumanization can take many forms, including treating individuals as if they are merely
objects or specimens, failing to recognize their unique experiences and perspectives, or
failing to respect their autonomy and dignity.

In the context of psychological testing, dehumanization can occur when test administrators
fail to consider the impact of the testing process on the test taker. For example, using tests
that are culturally or linguistically inappropriate, or using tests that are overly stressful or
anxiety-provoking, can be dehumanizing.

It is important for test administrators to be aware of the potential for dehumanization in


psychological testing, and to take steps to avoid it. This can include using tests that are
culturally sensitive and appropriate, providing clear explanations and instructions, and being
respectful and empathetic towards the test taker. By taking these steps, test administrators can
help to ensure that psychological testing is conducted in a way that is respectful, fair, and
beneficial to the test taker.

Labelling in psychological testing

Labelling is the process of assigning a particular label or diagnosis to an individual based on


their test results or other characteristics. In psychological testing, labels can be based on a
variety of factors, including test scores, behaviors, and symptoms.

Labelling can be a controversial practice in psychological testing because it can have


significant consequences for the individual being labelled. For example, a label or diagnosis
can influence the way that others perceive and interact with the individual, and can also
impact their access to certain resources or opportunities.
It is important for test administrators to be aware of the potential risks and limitations of
labelling in psychological testing, and to use caution when applying labels to individuals.
This can include considering the potential consequences of the label, seeking input from
multiple sources, and avoiding overgeneralization or stereotyping.

By being mindful of the potential risks and limitations of labelling, test administrators can
help to ensure that psychological testing is conducted in a way that is respectful, fair, and
beneficial to the test taker.

Test security is an important aspect of ethical psychological testing. It refers to the measures
taken to protect the integrity and validity of a test, as well as the confidentiality of test results.
Some specific ways to ensure test security in psychological testing include:
1. Protecting test materials: Test materials should be stored in a secure location, and
access should be restricted to authorized individuals.
2. Ensuring test administration: Tests should be administered according to the specified
instructions, and any deviations from these instructions should be documented and
reported.
3. Maintaining confidentiality: Test results should be kept confidential and only shared
with those who have a legitimate need to know.
4. Protecting test takers: Test takers should be treated with respect and their rights and
well-being should be protected.
5. Monitoring and enforcing test security: Test security should be monitored and
enforced through the use of measures such as test-taking agreements, proctoring, and
the use of secure online testing platforms.
By following these and other best practices related to test security, psychological testing can
be conducted in an ethical and responsible manner.
Classical Test Theory
Classical Test Theory

Measurement is the process of quantifying the characteristics of a person or


object. Theories of measurement help to explain measurement results (i.e.,
scores), thereby providing a rationale for how they are interpreted and treated
mathematically and statistically. Classical test theory (CTT) is a measurement
theory used primarily in psychology, education, and related fields. It was
introduced at the beginning of the 20th century and has evolved since then. The
majority of tests in psychology and education have been developed based on
CTT. This theory is also referred to as true score theory, classical reliability
theory, or classical measurement theory.
Classical test theory is based on a set of assumptions regarding the properties of
test scores. Although different models of CTT are based on slightly different
sets of assumptions, all models share a fundamental premise postulating that the
observed score of a person on a test is the sum of two unobservable
components, true score and measurement error. True score is generally defined
as the expected value of a person’s observed score if the person were tested an
infinite number of times on an infinite number of equivalent tests. Therefore,
the true score reflects the stable characteristic of the object of measurement (i.e.,
the person). Measurement error is defined as a random “noise” that causes the
observed score to deviate from the true score.
Assumptions of Classical Test Theory
Classical test theory assumes linearity—that is, the regression of the observed
score on the true score is linear. This linearity assumption underlies the practice
of creating tests from the linear combination of items or subtests. In addition,
the following assumptions are often made by classical test theory:
 The expected value of measurement error within a person is zero.
 The expected value of measurement error across persons in the
population is zero.
 True score is uncorrelated with measurement error in the population of
persons.
 The variance of observed scores across persons is equal to the sum of the
variances of true score and measurement error.
 Measurement errors of different tests are not correlated.
The first four assumptions can be readily derived from the definitions of true
score and measurement error. Thus, they are commonly shared by all the
models of CTT. The fifth assumption is also suggested by most of the models
because it is needed to estimate reliability. All of these assumptions are
generally considered “weak assumptions,” that is, assumptions that are likely to
hold true in most data. Some models of CTT make further stronger assumptions
that, although they are not needed for deriving most formulas central to the
theory, provide estimation convenience:
 Measurement error is normally distributed within a person and across
persons in the population.
 Distributions of measurement error have the same variance across all
levels of true score.
Important Concepts in Classical Test Theory
Reliability and Parallel Tests
True score and measurement error, by definition, are unobservable. However,
researchers often need to know how well observed test scores reflect the true
scores of interest. In CTT, this is achieved by estimating the reliability of the
test, defined as the ratio of true score variance to observed score variance.
Alternatively, reliability is sometimes defined as the square of the correlation
between the true score and the observed score. Although they are expressed
differently, these two definitions are equivalent and can be derived from
assumptions underlying CTT.
To estimate reliability, CTT relies on the concept of parallel test forms. Two
tests are considered parallel if they have the same observed variance in the
population of persons and any person has the same true score on both tests. If
these conditions hold, it can be shown that the correlation between two parallel
tests provides an estimate of the reliability of the tests.
Validity Versus Reliability
The definition of true score implies an important notion in CTT: that the true
score of a person on a measure is not necessarily the same as that person’s value
on the construct of interest. Validity concerns how well observed scores on a
test reflect a person’s true standing on the construct that the test is meant to
measure. As such, validity is a concept that is totally distinct from reliability.
Reliability reflects the strength of the link between the observed score and the
true score, whereas validity indexes the link between the observed score and the
construct of interest. The reliability of a test sets an upper bound for its validity;
hence, a test cannot have high validity with low reliability.
Beyond Classical Test Theory
As useful as it is, CTT has certain limitations. It has been criticized for its
nonspecific concept of measurement error.
Its assumption about the linearity of the regression line of observed score on
true score has also been questioned on both theoretical and empirical grounds.
Accordingly, more sophisticated theories have been proposed to address those
limitations. In particular, generalizability theory explicitly considers the
contributions of multiple sources of measurement error to observed scores and
offers methods for estimating those effects. Item response theory postulates a
nonlinear regression of a person’s responses to a test item on his or her latent
ability (a concept that is similar to true score in CTT). These measurement
theories offer certain advantages over CTT, but they are more complex and
depend on stronger assumptions. Therefore, CTT remains popular because of its
simplicity and, more important, the robustness against violations of its basic
assumptions.

Classical test theory (CTT) is a statistical model that is used to evaluate the
reliability and validity of psychological tests. However, CTT has some
limitations that should be considered when using it to evaluate psychological
tests:
1. Assumes that all test items are equally difficult: CTT assumes that all test
items are equally difficult, which may not always be the case. This can
lead to inaccurate estimates of test reliability and validity.
2. Focuses on group-level statistics: CTT is primarily concerned with group-
level statistics, such as the mean and standard deviation, rather than
individual differences. This means that it may not be able to accurately
assess the characteristics of individual test takers.
3. Ignores response styles: CTT does not take into account response styles,
such as the tendency of some individuals to consistently give extreme
responses or to agree with all statements. This can lead to inaccurate
estimates of test reliability and validity.
4. Limited ability to evaluate the quality of test items: CTT does not provide
a way to evaluate the quality of individual test items or to identify items
that may be biased or irrelevant.
5. Assumes a normal distribution of scores: CTT assumes that test scores
are normally distributed, which may not always be the case. This can lead
to inaccurate estimates of test reliability and validity.
L I M I TAT I O N S O F CTT (PPT)
● The consequences of examinees scores are test dependent.
● Conceptual issues with the definitions of reliability & (SEM).
● Test-oriented rather than item-oriented.
● Historically, CTTs provided less-than-ideal solutions to many testing
problems such as designs of tests, identification of biased items, etc.
IRT IN SIMPLE WORDS
Item response theory (IRT) is a statistical model that is used to describe how
well a person performs on a particular item (e.g., a question on a test) as a
function of their underlying ability or latent trait. The latent trait is a
characteristic that is not directly observed but is thought to influence a person's
performance on the item.
In IRT, the relationship between a person's latent trait and their performance on
an item is described using a mathematical function called the item characteristic
curve (ICC). The ICC is a graph that shows the probability that a person will
correctly respond to the item as a function of their latent trait.
IRT models can be used to estimate a person's latent trait based on their
responses to a set of items. These estimates can be used to compare the
difficulty of different items, to identify the items that are most diagnostic of a
particular latent trait, and to predict how a person is likely to perform on a
particular item given their latent trait.
IRT is often used in educational and psychological assessment to evaluate the
quality of tests and to design better tests. It is also used in other fields, such as
market research, where it can be used to understand how different product
attributes influence consumer behavior.

Item response theory (IRT) is a statistical model that is used to analyze and
interpret data from psychological tests and educational assessments. It is based
on a set of assumptions that are designed to ensure that the model accurately
reflects the underlying cognitive and psychological processes involved in test-
taking.
Here are some of the key assumptions of IRT:
1. Unidimensionality: IRT assumes that the items on a test measure a single
underlying trait or dimension. This means that the items are all related to
the same underlying construct, such as intelligence or personality.
2. Local independence: IRT assumes that the relationships between items
are independent of one another. This means that the difficulty of one item
does not depend on the difficulty of any other item.
3. Monotonicity: IRT assumes that as an individual's trait level increases,
their probability of correctly answering an item also increases. This
means that as a test-taker becomes more knowledgeable or skilled in a
particular area, they should be more likely to get items related to that area
correct.
4. Differential item functioning: IRT assumes that the difficulty of an item
does not depend on the test-taker's characteristics, such as their gender or
ethnicity. This means that the items on a test should be fair and unbiased
for all test-takers.
5. Response continuity: IRT assumes that test-takers will have a continuous
range of responses to an item, rather than just a binary correct or incorrect
response. This allows for more accurate estimates of trait levels and item
parameters.

In item response theory (IRT), there are several key concepts that are important
to understand:
1. Trait: A trait is a psychological characteristic or attribute that is being
measured by a test. Examples of traits that might be measured by a test
include intelligence, personality, or achievement.
2. Item: An item is a single question or task that is included on a test. Each
item is designed to measure a particular trait or set of traits.
3. Item difficulty: The item difficulty is a measure of how difficult an item
is to complete correctly. It is typically expressed as a number on a scale,
with higher numbers indicating more difficult items.
4. Item discrimination: The item discrimination is a measure of how well an
item is able to differentiate between test-takers with different levels of the
trait being measured. High discrimination means that the item is able to
distinguish between test-takers who have different levels of the trait,
while low discrimination means that the item is not very effective at
doing so.
5. Test information curve: The test information curve is a graph that shows
how the precision of trait estimates changes as a function of the trait
level. The curve is typically plotted with the trait level on the x-axis and
the precision of the estimate on the y-axis.
6. Item response function: The item response function (IRF) is a
mathematical function that describes the probability of a test-taker
correctly answering an item as a function of their trait level. The shape of
the IRF depends on the difficulty and discrimination of the item.
7. Latent trait: A latent trait is a trait that cannot be directly observed but can
be inferred from test scores. In IRT, the latent trait is the underlying
construct being measured by the test.

You might also like