Introduction To Psychological Testing
Introduction To Psychological Testing
DEFINITION OF TEST
Tests are enormously varied in their formats and applications. Nonetheless, most tests possess
these defining features:
- Scores or categories:
Every test furnishes one or more scores or provides evidence that a person belongs to one
category and not another.
X=T+e
where X is the observed score, T is the true score, and e is a positive or negative error
component.
- Norms or standards: An examinee’s test score is interpreted by comparing it with the scores
obtained by others on the same test. Test developers typically provide norms—a summary of
test results for a large and representative group of subjects, referred to as the standardization
sample.
- Prediction of nontest behavior: The ultimate purpose of a test is to predict additional behaviors
other than those directly sampled by the test. The ability of a test to predict nontest behavior is
determined by an extensive body of validational research.
DIFFERENCE BETWEEN TESTS AND EXPERIMENTS
Although tests and experiments are often used synonymously, they each have their own unique
context of usage. An experiment is a scientific method aimed at validating a hypothesis or
discovering new knowledge. It is a systematic research study in which the researcher directly
varies some factor(s), holds all other factors constant and observes the result of the variation.
On the other hand, a test typically does not have a hypothesis; instead, it is a procedure to
assess quality or performance and examine individual differences between people along
particular aspects.
HISTORICAL OVERVIEW
The history of psychological testing is a fascinating story and has abundant relevance to
present-day practices. Contemporary tests did not spring from a vacuum; they evolved slowly
from a host of precursors introduced over the last one hundred years.
Rudimentary forms of testing date back to 2200 B.C. in China. The Chinese emperors used
grueling written exams to select officials for civil service.
Modern psychological testing owes its inception to the era of brass instrument psychology that
flourished in Europe during the late 1800s. By testing sensory thresholds and reaction times,
pioneer test developers such as Sir Francis Galton demonstrated that it was possible to
measure the mind in an objective and replicable manner.
The British genius Francis Galton (1822–1911) invented the first battery of tests, a peculiar
assortment of sensory and motor measures.
American psychologist James McKeen Cattell (1860–1944) studied with Galton and then, in
1890, proclaimed the modern testing agenda in his classic paper titled “Mental Tests and
Measurements.”
In the late 1800s, a newfound humanism toward the mentally retarded, reflected in the
diagnostic and remedial work of French physicians Esquirol and Seguin, helped create the
necessity for early intelligence tests.
Alfred Binet (1857–1911) invented the first modern intelligence test in 1905. In 1905, Binet and
Simon developed the first useful intelligence test in Paris, France. Their simple 30-item measure
of mainly higher mental functions helped identify schoolchildren who could not profit from
regular instruction. Curiously, there was no method for scoring the test.
In 1908, Binet and Simon published a revised 58-item scale that incorporated the concept of
mental level. In 1911, a third revision of the Binet-Simon scales appeared. Each age level now
had exactly five tests; the scale extended into the adult range.
In 1912, Stern proposed dividing the mental age by the chronological age to obtain an
intelligence quotient. In 1916, Terman suggested multiplying the intelligence quotient by 100 to
remove fractions. Thus was born the concept of IQ.
In 1910, Henry Goddard translated the 1908 Binet-Simon scale. In 1911, he tested more than a
thousand schoolchildren with the test, relying upon the original French norms. He was
disturbed to find that 3 percent of the sample was “feebleminded” and recommended
segregation from society for these children.
In 1916, Lewis Terman released the Stanford- Binet, a revision of the Binet scales. This well-
designed and carefully normed test placed intelligence testing on a firm footing once and for
all.
During WWI, Robert Yerkes headed a team of psychologists who produced the Army Alpha, a
verbally loaded group test for average and superior recruits, and the Army Beta, a nonverbal
group test for illiterates and non-English-speaking recruits.
Historical background of psychological testing includes developments of group
tests ,performance tests,aptitude tests ,test batteries ,Multifactor tests, Personality tests,Rating
scales,Self rating inventories and Projective techniques.
Objectivity
A test must be objective in nature, i.e., it must be free from the subjective element so that there
is complete interpersonal agreement among experts regarding the meaning of the items and the
scoring of the test. Objectivity here relates to two aspects of the test—objectivity of the items
and objectivity of the scoring system. By objectivity of items, it is meant that the items should be
phrased in such a manner that they are interpreted in exactly the same way by all those who
take the test. By objectivity of scoring, it means that the scoring method of the test should be a
standard one so that complete uniformity can be maintained when the test is scored by different
experts at different times
RELIABILITY
A test must also be reliable. Reliability of internal consistency of the test, and consistency in
results obtained upon testing and retesting is an index of temporal consistency. Reliability, thus,
includes both internal consistency as well as temporal consistency. Reliability reflects how free
test scores are from standardization flaws that may cause measurement errors.
Meaning :Reliability refers to the consistency and dependability of a test’s scores across
repeated administrations or different conditions.
A reliable test yields similar results under consistent conditions, ensuring that the measurement
is stable and replicable.
Ensuring high reliability in psychological tests is essential for obtaining accurate and meaningful
results, which in turn supports informed decision-making in various applied settings.
Types of reliability :
● Test-Retest Reliability: This type assesses the stability of test scores over time. By
administering the same test to the same group of individuals on two different occasions
and correlating the scores, one can determine the temporal stability of the test. High
test-retest reliability indicates that the test produces consistent results over time.
● Inter-Rater Reliability: This form evaluates the degree of agreement between different
raters or observers assessing the same phenomenon. It is crucial in situations where
subjective judgment is involved, ensuring that different evaluators produce similar scores
or ratings.
● Internal Consistency Reliability: This type examines the consistency of results across
items within a test. Methods such as split-half reliability, where the test is divided into two
halves and the scores are correlated, and Cronbach’s alpha, which assesses the
average correlation among all items, are commonly used to evaluate internal
consistency
Extrinsic Factors
Important extrinsic factors affecting the reliability of a test may be enumerated as following :
1)Group variability: When the group of examinees being tested is hornogereous in abile, the
reliability of the test scores is likely to be lowered. But when the examinees vary widely in their
range of ability, that is, the group of examinees is a heterogeneous one, the reliability of hea test
scores is likely to be high. The effect of variability on reliability can be examined by seeing what
happens when the variability is zero.
2)Guessing by the examinees: Guessing in a test is an important source of reliability. Two-
alternative response options, there is a 50% chance of answering the items correctly on the
basis of the guess. In multiple-choice items, the chances of getting the answer correct purely by
guessing are reduced. Guessing has two important effects upon the total test scores.
4)Momentary fluctuations in the examinee: Momentary fluctuations influence the test score
sometimes by raising the score and sometimes by lowering it. Accordingly, they tend to affect
reliability. A broken pencil, momentary distraction by the sudden sound of an aeroplane flying
above, anxiety regarding noncompletion of homework, mistake in giving the answer and
knowing no way to change it, are some of the factors which explain momentary fluctuations in
the examinee.
Intrinsic Factors:
1.Length of the test- A longer fest tends to yield a higher reliability coefficient than a shorter test.
Lengthening the test or averaging total test scores obtained from several repetitions of the sare
test tends to increase the reliability . & has been demonstrated that averaging the test scores of
several applications essentially gives the same result as increasing the length of the test.
2.Range of the total scores- if the obtained total scores on the test are too close to each other,
that is, if there’s lesser variability among them, the reliability of the test is lowered. On the other
hand, if the total scores on the test vary widely, the reliability on the test increases.
3.Scorer reliability: Scorer reliability (also known as reader reliability) is also an important factor
which affects the reliability of the test. By scorer reliability is meant how closely two or more
scorers agree in scoring or rating the same set of responses. If they do not agree, the reliability
is likely to be lowered.
4.Discrimination value: When the test is composed of discriminating items, the decimal test
correlation is likely to be high and then, the reliability is also likely to be high. But when the items
do not discriminate well between superior and inferior, that is, when iterns have poor
discrimination values, the iter-total correlation is affected, which ultimately attenuates the
reliability of the test.
5.Difficulty value of items: In general, items having indexes of difficulty at 0.5 or close to it, yield
higher reliability than items of extreme indexes of difficulty. In other words, when items are too
easy or too difficult, the test yields very poor reliability.
6.homogeneity of items: it is an important factor in reliability. When the items measure different
functions and the intercorrelations of items are zero or near it (that is, when the test is
heterogeneous), the reliability is zero or very low. When all items measure the same functions
or traits and when the inter-item correlation is high, the reliability of the test is also high.
Validity
Validity is another prerequisite for a test. Validity determines how well a test measures
what it is intended to measure. A valid test should align with an independent standard or
external criterion. This criterion should serve as an accurate benchmark for assessing the trait
or ability being tested. Validity often depends on reliability—if a test produces inconsistent
results, it is unlikely to provide meaningful or accurate measurement
Types of validity:
CONTENT VALIDITY
Content validity refers to the extent to which a psychological instrument, such as a test or
assessment, measures the intended construct accurately and comprehensively, ensuring that
the items included in the instrument effectively represent the content domain.
In the field of psychology, content validity is crucial for developing and using measurement tools
such as personality assessments, intelligence tests, and diagnostic instruments. Researchers
and practitioners must ensure that the items in these tools align with the theoretical framework
and content domain they aim to measure. This ensures accurate and meaningful interpretations
of results
Content validity plays a pivotal role in ensuring the accurate and precise measurement of
psychological constructs through the development and refinement of assessment instruments,
guaranteeing that the items effectively capture the intended content domain. This means that
when assessing psychological constructs, the content of the measurement tool should
adequately represent all facets of the construct in question.
CRITERION-RELEVANT VALIDITY
Criterion validity indicates how well the scores or responses of a test converge with
criterion variables with which the test is supposed to converge. There are several
contexts and objectives for testing criterion validity. For instance, in order to save more
time, a psychologist might want to suggest a condensed version of a test to replace the
original, longer one. The condensed form's connection with the original test
demonstrates its criterion and concurrent validity. When a psychologist wants to assess
a self-report test for a mental illness, the test's concurrent validity can be evaluated by
comparing the test results with a concurrent clinical diagnosis.
There are 2 methods to assess criterion validity. In concurrent validity The link between
test results and a criterion measured simultaneously is examined by concurrent validity.
When researchers wish to evaluate how well a test predicts present performance or
status, they employ this kind of validity. Predictive validity assesses a test's ability to
forecast future events based on its findings. This is frequently employed in domains
where predicting future performance or behavior is the aim, such as education,
psychology, and hiring.
CONSTRUCT VALIDITY
The ability of a test to measure the notion or concept it assesses is known as construct validity.
This sort of validation is particularly useful when a notion or concept is impossible to measure
directly. Construct validity, to put it simply, is the degree to which a test lives up to its claims.
Construct validity evaluates whether your hypothesis is supported by the behaviour of the
variables you are testing for. This type of testing is typically validated by examining the
correlation between the two measurements by comparing the test to other comparable test
features.
Types of construct validity:
Convergent validity
The degree to which two measures of constructs you hypothesize are related is known as
convergent validity. By comparing test findings with those of another test that is intended to
assess the same construct, you can examine convergent validity.
Discriminant validity
However, when two measures of unrelated constructs that ought to be unconnected are
unrelated, this is known as discriminant validity. The results for convergent validity and
discriminant validity are obtained in the same manner.
Meaning of Validity
Another essential characteristic of a scientific tool is validity. The term 'validity' denotes
truthfulness or fidelity. Thus, validity is the degree to which a test measures what it is supposed
to measure. Validity is not the test's self-correlation but correlation with some independent
external criteria, which are regarded by experts as the best measure of the trait or ability the test
is designed to measure.
A number of different authors have defined validity in slightly different terms. Anastasi (1968,
99) has stated, "The validity of a test concerns what the test measures and how well it does so."
Lindquist (1951, 213) has defined the validity of a test as "the accuracy with which it measures
that which is intended to measure or as the degree to which it approaches infallibility in
measuring what it purports to measure." Kaplan and Saccuzzo (2001) have defined validity as
"the agreement between a test score or measure and the quantity it is believed to measure."
Restating the definition of validity in the highly influential Standards for Educational and
Psychological Testing (AERA, APA, & NCME 1999) can be stated as follows: "A test is valid to
the extent that references made from it are appropriate, meaningful, and useful." These
references imply that in establishing the validity of a test, the test should be correlated with
some hypothetical ideal independent measure or criteria. The correlation coefficient computed
between the test and ideal measures or criteria is called the validity coefficient. 'Independent
criteria' implies some measurement of the characteristic or the range of characteristics (external
to the test) asserted by the test to be the target of measure.
In a broad sense, validity is connected with generalizability. When a test is a valid test, it means
that its conclusion can be generalized in terms of the general population. When the independent
criterion has been established and when both the test and the criterion are reliable, the
correlation between the test and the criterion can be used as evidence of the validity of the test
safely. Although validity defined the meaning of tests and measures, the term itself started
losing its meaning. But in 1985, the joint committee of the American Educational Research
Association (AERA), the American Psychological Association (APA), and the National Council
on Measurement in Education (NCME) published a very important booklet for psychological test
guidelines titled Standards for Educational and Psychological Testing, which was revised in
1999. This joint committee, by rejecting the numerous possible definitions of validity, suggested
that validity is nothing but plain evidence for inferences drawn about a test score. These
evidences can be content-related, criterion-related, or construct-related. In this sense, validity
actually refers to evidence in favor of what can be said on the basis of the test scores and not
the tests themselves (Landy 1986).
1. Length of the Test – The longer the test, the more valid and reliable it becomes. The
lengthening of the test or repeated administration of the same test increases the reliability,
and since the validity in a homogenous test is dependent upon reliability, it also increases
the validity of the test.
2. Range of Validity – The range of ability of the samples used also influences the
validity. If the range of ability is limited for the subjects as such the wider range of scores is
not possible, the validity coefficient will be low and coefficient would be enhanced if the
subjects have a wider range of ability so that a wider range of scores is obtained.
3. Ambiguous Directions – If the directions of the test are ambiguous and differently
interpreted by different examinees, such items will likely encourage guessing on the part of
the examinees. Thus, it will lower the validity of the test.
RELIABILITY VALIDITY
Norms
Norms are standard reference points or benchmarks that are established based on data
collected from a representative group. These are used to interpret individual results or
performance. They interpret test scores by comparing them to the average performance of
the group. Common types of norms include age norms, grade norms, percentile norms,
and standard score norms. These benchmarks help in understanding individual test
results in context. Without norms, test scores lack meaningful interpretation.
Norms might be defined as the average performance on a particular test made by a
standardization sample. A standardization sample is one that is representative of the population
and takes the test for the express purpose of providing data for comparison and interpretation of
the test scores. Since psychological tests rarely provide absolute, ration measures of
psychological attributes. So, raw scores might not be useful to measure such attributes. The
way we can measure attributes in a useful manner is by comparing one person's performance
with another.
Norm-based Interpretation - When a person's test score is interpreted by comparing that score
with scores of several other people.
Defining the target population: A test is administered to a particular group. The composition
of this target group (normative group) is determined by the intended use of the test. Hence, the
first step is to define the composition of the target group. For Example, the Test of English as a
Foreign Language (TOEFL) is intended for students whose native tongue is not English but who
plan on studying abroad where the medium of instruction is in English. Thus, for TOEFL the
target population will consist of those students.
Selecting the sample from the target population: After defining the target population, the
next step is to select a representative sample. To achieve this, a cross-sectional representation
of the population is used, ensuring the sample reflects the diversity of the larger group. To
ensure this representative character of the sample, various sampling techniques are employed.
While random sampling is ideal for larger samples, it is often impractical, so cluster sampling or
its variations are typically employed as more feasible alternatives.
TYPES OF NORMS
Different norms have been classified corresponding to the four commonly derived scores. The
four types of norms commonly used in psychological and educational testing are age-equivalent
norms, grade-equivalent norms, percentile norms, and standard score norms.
Age-equivalent norms: These norms represent the average performance of individuals within
a specific age group on a given test or measurement. These norms are useful for assessing
traits that show systematic growth with age, such as height, weight, and cognitive abilities
during childhood. However, they have limitations, as growth rates vary across different ages and
traits. For example, the development of general intelligence is faster in early childhood but slows
down significantly after adolescence, making it difficult to maintain uniform standards across all
ages. Additionally, some traits, like vision acuity, do not exhibit progressive change, rendering
age norms ineffective for such measures.
Percentile norms: These norms are widely used in psychological and educational
assessments. They rank individuals based on the percentage of people in the standardization
sample who scored at or below a particular raw score. For example, if a student scores at the
70th percentile on a test, it means they performed better than 70% of the sample. Percentile
norms are easy to understand and interpret, but they can be misleading because the units are
not equal across the scale. Small raw score differences in the middle of the distribution can
result in large percentile changes, while large raw score differences at the extremes may result
in minor percentile shifts.
Standard Score Norms: These types of norms are based on standardized scores, such as z-
scores, which have a fixed mean and standard deviation. These norms are advantageous
because they maintain equal units of measurement across the entire scale, unlike percentile
norms. A z-score, for example, shows how many standard deviations a score is above or below
the mean, allowing for accurate comparisons across different tests or distributions. This makes
standard score norms particularly useful in psychological testing, where comparing performance
across diverse measures is often necessary.
PRACTICALITY/USABILITY
A test must also be practicable/usable from the point of view of the time taken in its completion,
length, scoring, etc. In order words, the test should not be lengthy or complex and the scoring
method must not be difficult and it should not require highly specialized training to administer. In
addition the test should be cost-effective and efficient, ensuring it is accessible and feasible for
widespread use.
TYPES OF TESTS
1. Group tests:
Group tests are largely pencil-and-paper measures suitable to the testing of large groups of
persons at the same time.
Example:Multidimensional aptitude battery-II(MAB-II),Raven’s progressive matricesMultilevel
battery:the cognitive abilities test(CogAT).
2. Individual tests: Individual tests are instruments that by their design and purpose must be
administered one on one.
Example::Stanford-Binet intelligence scale,Detroit tests of learning aptitude-4(DTLA-
4),Cognitive assessment system-II(CAS-II).
An important advantage of individual tests is that the examiner can gauge the level of motivation
of the subject and assess the relevance of other factors (e.g., impulsiveness or anxiety) on the
test results.
3. Objective tests: Objective tests are those whose items are scored by competent examiners or
observers in such a way that no scope for subjective judgement or opinion exists and thus, the
scoring remains unambiguous. Tests having multiple choices, true-false and matching items are
usually called objective tests. In such tests, the problem as well as its answer is given along with
the distractor. The problem is known as the stem of the item. A distractor answer is one which is
similar to the correct answer but is not actually the correct one. Such tests are also known as
new- type tests or limited-answer tests.
Example: Minnesota Multiphasic Personality Inventory (MMPI)
4.Subjective tests: Subjective tests are tests whose items are scored by the competent
examiners or observers in a way in which there exists some scope for subjective judgement and
opinion. As a consequence some elements or vagueness and ambiguity remain in their scoring.
These are also called essay tests. They intend to assess an examinee's ability to organize a
comprehensive answer, recall and select important information, and present the same logically
and effectively. It is also called free answer tests.
Example: Thematic Apperception Test (TAT)
6.Speed tests: Speed tests have severe time limits but the items are comparatively easy and
the difficulties involved therein are more or less of the same degree. Here, very few examinees
are supposed to make errors. It reveals how rapidly, i.e. with what speed the examinees can
respond within a given time limit. Most of the clerical aptitude tests belong to this very category.
Example: Clerical Aptitude Test
8. Non-verbal tests: non-verbal tests are those that emphasize but don't altogether eliminate the
role of language by using symbolic materials like pictures, figures, etc. such tests use the
language in instruction but in items, they don't use language . Test items present the problems
with the help of figures and symbols and are commonly used with young children as an attempt
to assess the nonverbal aspects of intelligence such as spatial perception.
Example: Raven progressive matrices
9. Performance test: those tests that require the examinee to perform a task rather than answer
some questions is known as a performance test. Such tests prohibit the use of language in
items. Occasionally , oral language is used to give instructions, or the instructions may also be
given through gestures and pantomime. These tests are usually administered individually so
that the examiner can count the errors committed by the examinee or the student can assess
how long it takes him to complete a given task. Hence, performance tests emphasize on the
examiner's ability to perform a task rather than answer some questions.
Example: Kohs Block Design Test
10. Non Language tests: non language tests are those which don't depend upon any form of
written, spoken or reading communication. Such tests remain completely independent of the
ability to use language in any way. Instructions are usually given through gestures or pantomine
and the examiners respond by pointing at or manipulating objects such as pictures, blocks,
puzzles etc. Such tests are usually administered to those persons or children who can't
communicate in any form of ordinary language.
Example: The Raven's Progressive Matrice
11. Neuropsychological test: Neuropsychological tests are the tests which are used in the
assessment of persons with known or suspected brain dysfunctioning. Achievement tests
assess what the persons have acquired in the given area as a function of some training or
learning.
Example: The Wisconsin Card Sorting Test (WCST)
On the basis of the criterion of Purpose or Objective
1.INTELLIGENCE TESTS
Intelligence tests were originally designed to sample a broad assortment of skills in order to
estimate the individual’s general intellectual level.
The Binet-Simon scales were successful in part because they incorporated heterogeneous
tasks, including word definitions, memory for designs, comprehension questions, and spatial
visualization tasks.
In general, the term intelligence test refers to a test that yields an overall summary score based
on results from a heterogeneous sample of items.
2.APTITUDE TESTS
Aptitude tests measure one or more clearly defined and relatively homogeneous segments of
ability. Such tests come in two varieties: single aptitude tests (A single aptitude test appraises
only one ability) and multiple aptitude test batteries (the multiple aptitude test battery provides a
profile of scores for a number of aptitudes).
Aptitude tests are often used to predict success in an occupation, training course, or educational
endeavor.
For example, the Seashore Measures of Musical Talents (Seashore, 1938), a series of tests
covering pitch, loudness, rhythm, time, timbre, and tonal memory, can be used to identify
children with potential talent in music.
The most common use of aptitude tests is to determine college admissions. SAT (Scholastic
Assessment Test) of the College Entrance Examination Board contains a Verbal section
stressing word knowledge and reading comprehension; a Mathematics section stressing
algebra, geometry, and insightful reasoning; and a Writing section. In effect, colleges that
require certain minimum scores on the SAT for admission are using the test to predict academic
success
3.ACHIEVEMENT TESTS
Achievement tests measure a person’s degree of learning, success, or accomplishment in a
subject matter. The implicit assumption of most achievement tests is that the schools have
taught the sub- ject matter directly. The purpose of the test is then to determine how much of
the material the subject has absorbed or mastered. Achievement tests commonly have several
subtests, such as reading, mathematics, language, science, and social studies. The distinction
between aptitude and achieve- ment tests is more a matter of use than content
4.CREATIVITY TESTS
Creativity tests assess a subject’s ability to produce new ideas, insights, or artistic creations that
are accepted as being of social, aesthetic, or scientific value. Thus, measures of creativity
emphasize novelty and Originality in the solution of fuzzy problems or production of artistic
ideas.
5.PERSONALITY TESTS
Personality tests measure the traits, qualities, or behaviors that determine a person’s
individuality; this information helps predict future behavior. These tests come in several different
varieties, including checklists, inventories, and projective techniques such as sentence
completions and inkblots.
Example: An Adjective Checklist
6. INTEREST INVENTORY
Interest inventories measure an individual’s preference for certain activities or topics and
thereby help determine occupational choice. These tests are based on the explicit assumption
that interest patterns determine and, therefore, also predict job satisfaction.
Example: If the examinee has the same interests as successful and satisfied accountants, it is
thought likely that he or she would enjoy the work of an accountant. The assumption that
interest patterns predict job satisfaction is largely borne out by empirical studies.
7.BEHAVIORAL PROCEDURES
Many kinds of behavioral procedures are available for assessing the antecedents and
consequences of behavior, including checklists, rating scales, interviews, and structured
observations. These methods share a common assumption that behavior is best understood in
terms of clearly defined characteristics such as frequency, duration, antecedents, and
consequences. Behavioral procedures tend to be highly pragmatic in that they are usually
interwoven with treatment approaches.
Example: A structured behavioral interview, where a job candidate is asked detailed questions
about past situations to assess how they behaved in specific work-related scenarios, revealing
their problem-solving skills, decision-making abilities, and response to pressure; questions like
“Tell me about a time you had to deal with a difficult client” are typical examples of this type of
test.
8.NEUROPSYCHOLOGICAL TESTS
Neuropsychological tests are used in the assessment of persons with known brain dysfunction.
Neuropsychology is the study of brain-behavior relation years, neuropsychologists have
discovered that certain tests and procedure to the effects of brain damage.
Neuropsychologists use these specialized tests and procedures to make inferences about the
locus, extent, and consequences of brain damage.
Example: Testing one’s intelligence can give a clue to whether there is a problem on brain-
behavior connection. The Wechsler Scale are the tests most often used to determine the level
of intelligence.
Classification
Classification involves assigning a person to one category rather than another. This often leads
to different treatment of some kind, like getting access to a specific college or job.
Placement involves sorting people into different programs based on their needs or skills. For
instance, a university might use a maths placement test to decide whether students should take
calculus, algebra or remedial classes.
Screening uses quick tests to identify people who might have special characteristics or needs.
These tests may misclassify some people and should be followed up with more comprehensive
tests.
Certification involves a pass/fail test that confers certain privileges when passed, such as the
right to practice psychology or drive a car.
Selection is similar to certification, in that it involves a pass/fail test that confers privileges like
getting into a university or gaining employment.
Self-Knowledge
Feedback from psychological tests can sometimes lead to people changing their career paths or
other aspects of their lives. However, in most cases, people already know what the test results
will show. For example, a high-achieving college student will not be surprised to learn that they
have a high IQ.
Program Evaluation
Psychological tests can be used to evaluate the effectiveness of social and educational
programs. For example, tests can provide an objective way to assess whether programs like
Head Start are improving children’s scholastic performance. In general, these tests show that
children in Head Start make gains in IQ and academic achievement, but these benefits tend to
decrease over time.
Research
Psychological tests also play a role in both applied and theoretical behavioral research. For
example, researchers might use psychological tests to investigate if low-level lead absorption
causes behavioural issues in children.
These applications of psychological testing can overlap. For example, a test that helps with
psychiatric diagnosis can also give an individual a better understanding of themselves.
Additionally, psychological tests are seen as important and their validity is respected, as seen in
debates and arguments around testing-based research.
Issue of Human Rights - Several human rights are recognised in the field of psychological
testing. One of them is the right to not be tested, that is, people who do not want to be subjected
to psychological testing cannot be forced to do so. Similarly, subjects of psychological testing
have the right to know the results of such test, its interpretations and any decision that may
affect them. Other human rights such as the right to know who will have access to the data and
the right to confidentiality are also popular.
Issues of Invasion of Privacy - Sometimes subjects of psychological testing feel that their
privacy has been invaded. This issue was studied by Dahlstrom. He suggested that the notion
of invasion of privacy emerges due to misunderstanding as psychological tests have fixed aims
and cannot invade a person's privacy. He also suggested that the concept of invasion of privacy
is very ambiguous. It only happens when certain information regarding a person is used
inappropriately. Because psychologists are bound by ethics and legalities, they don't reveal any
more information than is needed. The ethics code of APA endorses Confidentiality, which
dictates that personal information acquired by a psychologist is revealed to others only with the
consent of the person.
Issue of Divided Loyalties - It highlights how psychologists often experience conflicts between
their duty to their employer and their responsibility to individuals' welfare. For example, an
industrial psychologist may need to identify accident-prone employees for workplace safety but
must also respect individuals' rights and privacy. This conflict arises when a psychologist must
maintain test security while ensuring fairness in decision-making. If they disclose test details to
one person, others may use this information to manipulate results, compromising the test's
integrity. This creates a situation where the psychologist is caught between two opposing ethical
principles.
Responsibility of test constructors and test users - According to the latest standards for test use,
test constructors are required to provide a test manual that clearly outlines the proper
application of the test. This manual should include information on reliability, validity, and norms,
as well as detailed guidelines on scoring and administration procedures. On the other hand, test
users are responsible for knowing the reason for and the implications of using the test. Test
users should have sufficient knowledge of test construction, supporting research, and
psychometric properties.
IMPROVING RELIABILITY
1. Standardization of Procedures
Standardization involves administering and scoring the test in a consistent manner across
different situations and populations.
Implementation: Create a detailed manual that outlines every step of the test administration
process. This includes the instructions given to participants, the environmental conditions under
which the test should be conducted, and the specific ways in which responses should be
recorded and scored.
2. Clear and Precise Test Instructions
Implementation: Pilot the instructions with a small group to identify any potential
misunderstandings. Revise the instructions based on feedback to make them as clear as
possible. Consider using visual aids or demonstrations if necessary.
Proper training ensures that those who administer the test do so in a consistent and
standardized manner.
Employing tools that have been scientifically validated and have shown high reliability in
previous research.
Implementation: Review the existing literature to identify the most reliable tools for the specific
construct being measured. Use these tools consistently across different studies and settings.
Pre-testing the psychological test on a small sample to identify potential issues before full-scale
administration.
Implementation: Analyze the pilot test results to detect any inconsistencies or problems with the
test items. Make necessary adjustments to improve the test's reliability.
Maintaining similar testing environments and conditions to minimize external factors that could
affect test results.
Implementation: Control variables such as lighting, noise levels, and seating arrangements.
Ensure that all participants are tested under similar conditions to reduce variability.
7. Regular Calibration and Maintenance
Ensuring that any instruments or tools used in testing are regularly calibrated and maintained for
accuracy.
Implementation: Schedule regular maintenance and calibration checks for all testing equipment.
Keep detailed records of these checks to ensure consistency.
8. Test-Retest Reliability
Implementation: Administer the same test to the same group of people at two different points in
time and calculate the correlation between the two sets of scores. High correlation indicates
high test-retest reliability.
9. Inter-Rater Reliability
Implementation: Develop clear scoring criteria and train raters thoroughly. Use multiple raters
and calculate the agreement between their scores.