KEMBAR78
Introduction To Psychological Testing | PDF | Validity (Statistics) | Psychology
0% found this document useful (0 votes)
25 views22 pages

Introduction To Psychological Testing

Psychological testing is defined as a standardized procedure to measure various traits through behavior samples, utilizing norms for interpretation and aiming to predict non-test behaviors. The document outlines the historical development of psychological tests, emphasizing key figures and milestones, as well as the essential characteristics of a good test, including objectivity, reliability, and validity. It further distinguishes between tests and experiments, and elaborates on the types of validity necessary for effective psychological assessments.

Uploaded by

prernasundaram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views22 pages

Introduction To Psychological Testing

Psychological testing is defined as a standardized procedure to measure various traits through behavior samples, utilizing norms for interpretation and aiming to predict non-test behaviors. The document outlines the historical development of psychological tests, emphasizing key figures and milestones, as well as the essential characteristics of a good test, including objectivity, reliability, and validity. It further distinguishes between tests and experiments, and elaborates on the types of validity necessary for effective psychological assessments.

Uploaded by

prernasundaram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Introduction to Psychological Testing

DEFINITION OF TEST

A psychological test is a standardized procedure to measure quantitatively or qualitatively one


or more than one aspect of a trait by means of a sample of verbal or non-verbal behavior.
A psychological test is an organised succession of stimuli designed to measure quantitatively or
to evaluate qualitatively some mental process, trait or characteristic (Bean ,1953).
Anastasi & Urbina (1997) have defined a psychological test as essentially an objective and
standardized measure of a sample of behaviour.
A test is a standardized procedure for sampling behaviour and describing it with categories or
scores. Most tests have norms or standards by which the results can be used to predict other,
more important behaviors (Gregory, 2015).

Tests are enormously varied in their formats and applications. Nonetheless, most tests possess
these defining features:

- Standardized procedure: A test is considered standardized if the procedures for administering


it are uniform from one examiner and setting to another. Standardization depends on the
competence of the examiner and the directions for administration found in the instructional
manual.

- Behavior sample: A psychological test is a limited sample of behavior. Practical constraints


dictate that a test is only a sample of behavior, which is of interest only insofar as it permits the
examiner to make inferences about the total domain of relevant behaviors.

- Scores or categories:
Every test furnishes one or more scores or provides evidence that a person belongs to one
category and not another.

Psychometricians often express this fundamental point with an equation:

X=T+e
where X is the observed score, T is the true score, and e is a positive or negative error
component.

- Norms or standards: An examinee’s test score is interpreted by comparing it with the scores
obtained by others on the same test. Test developers typically provide norms—a summary of
test results for a large and representative group of subjects, referred to as the standardization
sample.

- Prediction of nontest behavior: The ultimate purpose of a test is to predict additional behaviors
other than those directly sampled by the test. The ability of a test to predict nontest behavior is
determined by an extensive body of validational research.
DIFFERENCE BETWEEN TESTS AND EXPERIMENTS
Although tests and experiments are often used synonymously, they each have their own unique
context of usage. An experiment is a scientific method aimed at validating a hypothesis or
discovering new knowledge. It is a systematic research study in which the researcher directly
varies some factor(s), holds all other factors constant and observes the result of the variation.
On the other hand, a test typically does not have a hypothesis; instead, it is a procedure to
assess quality or performance and examine individual differences between people along
particular aspects.

HISTORICAL OVERVIEW

The history of psychological testing is a fascinating story and has abundant relevance to
present-day practices. Contemporary tests did not spring from a vacuum; they evolved slowly
from a host of precursors introduced over the last one hundred years.
Rudimentary forms of testing date back to 2200 B.C. in China. The Chinese emperors used
grueling written exams to select officials for civil service.
Modern psychological testing owes its inception to the era of brass instrument psychology that
flourished in Europe during the late 1800s. By testing sensory thresholds and reaction times,
pioneer test developers such as Sir Francis Galton demonstrated that it was possible to
measure the mind in an objective and replicable manner.
The British genius Francis Galton (1822–1911) invented the first battery of tests, a peculiar
assortment of sensory and motor measures.
American psychologist James McKeen Cattell (1860–1944) studied with Galton and then, in
1890, proclaimed the modern testing agenda in his classic paper titled “Mental Tests and
Measurements.”
In the late 1800s, a newfound humanism toward the mentally retarded, reflected in the
diagnostic and remedial work of French physicians Esquirol and Seguin, helped create the
necessity for early intelligence tests.
Alfred Binet (1857–1911) invented the first modern intelligence test in 1905. In 1905, Binet and
Simon developed the first useful intelligence test in Paris, France. Their simple 30-item measure
of mainly higher mental functions helped identify schoolchildren who could not profit from
regular instruction. Curiously, there was no method for scoring the test.
In 1908, Binet and Simon published a revised 58-item scale that incorporated the concept of
mental level. In 1911, a third revision of the Binet-Simon scales appeared. Each age level now
had exactly five tests; the scale extended into the adult range.
In 1912, Stern proposed dividing the mental age by the chronological age to obtain an
intelligence quotient. In 1916, Terman suggested multiplying the intelligence quotient by 100 to
remove fractions. Thus was born the concept of IQ.
In 1910, Henry Goddard translated the 1908 Binet-Simon scale. In 1911, he tested more than a
thousand schoolchildren with the test, relying upon the original French norms. He was
disturbed to find that 3 percent of the sample was “feebleminded” and recommended
segregation from society for these children.
In 1916, Lewis Terman released the Stanford- Binet, a revision of the Binet scales. This well-
designed and carefully normed test placed intelligence testing on a firm footing once and for
all.
During WWI, Robert Yerkes headed a team of psychologists who produced the Army Alpha, a
verbally loaded group test for average and superior recruits, and the Army Beta, a nonverbal
group test for illiterates and non-English-speaking recruits.
Historical background of psychological testing includes developments of group
tests ,performance tests,aptitude tests ,test batteries ,Multifactor tests, Personality tests,Rating
scales,Self rating inventories and Projective techniques.

CHARACTERISTICS OF A GOOD TEST

Objectivity
A test must be objective in nature, i.e., it must be free from the subjective element so that there
is complete interpersonal agreement among experts regarding the meaning of the items and the
scoring of the test. Objectivity here relates to two aspects of the test—objectivity of the items
and objectivity of the scoring system. By objectivity of items, it is meant that the items should be
phrased in such a manner that they are interpreted in exactly the same way by all those who
take the test. By objectivity of scoring, it means that the scoring method of the test should be a
standard one so that complete uniformity can be maintained when the test is scored by different
experts at different times

RELIABILITY
A test must also be reliable. Reliability of internal consistency of the test, and consistency in
results obtained upon testing and retesting is an index of temporal consistency. Reliability, thus,
includes both internal consistency as well as temporal consistency. Reliability reflects how free
test scores are from standardization flaws that may cause measurement errors.

Meaning :Reliability refers to the consistency and dependability of a test’s scores across
repeated administrations or different conditions.

A reliable test yields similar results under consistent conditions, ensuring that the measurement
is stable and replicable.

Ensuring high reliability in psychological tests is essential for obtaining accurate and meaningful
results, which in turn supports informed decision-making in various applied settings.

Types of reliability :

● Test-Retest Reliability: This type assesses the stability of test scores over time. By
administering the same test to the same group of individuals on two different occasions
and correlating the scores, one can determine the temporal stability of the test. High
test-retest reliability indicates that the test produces consistent results over time.

● Inter-Rater Reliability: This form evaluates the degree of agreement between different
raters or observers assessing the same phenomenon. It is crucial in situations where
subjective judgment is involved, ensuring that different evaluators produce similar scores
or ratings.

● Internal Consistency Reliability: This type examines the consistency of results across
items within a test. Methods such as split-half reliability, where the test is divided into two
halves and the scores are correlated, and Cronbach’s alpha, which assesses the
average correlation among all items, are commonly used to evaluate internal
consistency

Factors affecting reliability:

Extrinsic Factors
Important extrinsic factors affecting the reliability of a test may be enumerated as following :

1)Group variability: When the group of examinees being tested is hornogereous in abile, the
reliability of the test scores is likely to be lowered. But when the examinees vary widely in their
range of ability, that is, the group of examinees is a heterogeneous one, the reliability of hea test
scores is likely to be high. The effect of variability on reliability can be examined by seeing what
happens when the variability is zero.
2)Guessing by the examinees: Guessing in a test is an important source of reliability. Two-
alternative response options, there is a 50% chance of answering the items correctly on the
basis of the guess. In multiple-choice items, the chances of getting the answer correct purely by
guessing are reduced. Guessing has two important effects upon the total test scores.

3) Environmental conditions: As far as possible, the testing environment should be uniform.


Arrangement should be such that light, sound, and other comforts are equal and uniform to all
the examinees , otherwise it will tend to lower the reliability of the test scores.

4)Momentary fluctuations in the examinee: Momentary fluctuations influence the test score
sometimes by raising the score and sometimes by lowering it. Accordingly, they tend to affect
reliability. A broken pencil, momentary distraction by the sudden sound of an aeroplane flying
above, anxiety regarding noncompletion of homework, mistake in giving the answer and
knowing no way to change it, are some of the factors which explain momentary fluctuations in
the examinee.

Intrinsic Factors:

1.Length of the test- A longer fest tends to yield a higher reliability coefficient than a shorter test.
Lengthening the test or averaging total test scores obtained from several repetitions of the sare
test tends to increase the reliability . & has been demonstrated that averaging the test scores of
several applications essentially gives the same result as increasing the length of the test.

2.Range of the total scores- if the obtained total scores on the test are too close to each other,
that is, if there’s lesser variability among them, the reliability of the test is lowered. On the other
hand, if the total scores on the test vary widely, the reliability on the test increases.

3.Scorer reliability: Scorer reliability (also known as reader reliability) is also an important factor
which affects the reliability of the test. By scorer reliability is meant how closely two or more
scorers agree in scoring or rating the same set of responses. If they do not agree, the reliability
is likely to be lowered.

4.Discrimination value: When the test is composed of discriminating items, the decimal test
correlation is likely to be high and then, the reliability is also likely to be high. But when the items
do not discriminate well between superior and inferior, that is, when iterns have poor
discrimination values, the iter-total correlation is affected, which ultimately attenuates the
reliability of the test.

5.Difficulty value of items: In general, items having indexes of difficulty at 0.5 or close to it, yield
higher reliability than items of extreme indexes of difficulty. In other words, when items are too
easy or too difficult, the test yields very poor reliability.

6.homogeneity of items: it is an important factor in reliability. When the items measure different
functions and the intercorrelations of items are zero or near it (that is, when the test is
heterogeneous), the reliability is zero or very low. When all items measure the same functions
or traits and when the inter-item correlation is high, the reliability of the test is also high.

Validity
Validity is another prerequisite for a test. Validity determines how well a test measures
what it is intended to measure. A valid test should align with an independent standard or
external criterion. This criterion should serve as an accurate benchmark for assessing the trait
or ability being tested. Validity often depends on reliability—if a test produces inconsistent
results, it is unlikely to provide meaningful or accurate measurement
Types of validity:

CONTENT VALIDITY

Content validity refers to the extent to which a psychological instrument, such as a test or
assessment, measures the intended construct accurately and comprehensively, ensuring that
the items included in the instrument effectively represent the content domain.
In the field of psychology, content validity is crucial for developing and using measurement tools
such as personality assessments, intelligence tests, and diagnostic instruments. Researchers
and practitioners must ensure that the items in these tools align with the theoretical framework
and content domain they aim to measure. This ensures accurate and meaningful interpretations
of results

Content validity plays a pivotal role in ensuring the accurate and precise measurement of
psychological constructs through the development and refinement of assessment instruments,
guaranteeing that the items effectively capture the intended content domain. This means that
when assessing psychological constructs, the content of the measurement tool should
adequately represent all facets of the construct in question.

CRITERION-RELEVANT VALIDITY

Criterion validity indicates how well the scores or responses of a test converge with
criterion variables with which the test is supposed to converge. There are several
contexts and objectives for testing criterion validity. For instance, in order to save more
time, a psychologist might want to suggest a condensed version of a test to replace the
original, longer one. The condensed form's connection with the original test
demonstrates its criterion and concurrent validity. When a psychologist wants to assess
a self-report test for a mental illness, the test's concurrent validity can be evaluated by
comparing the test results with a concurrent clinical diagnosis.

There are 2 methods to assess criterion validity. In concurrent validity The link between
test results and a criterion measured simultaneously is examined by concurrent validity.
When researchers wish to evaluate how well a test predicts present performance or
status, they employ this kind of validity. Predictive validity assesses a test's ability to
forecast future events based on its findings. This is frequently employed in domains
where predicting future performance or behavior is the aim, such as education,
psychology, and hiring.

CONSTRUCT VALIDITY

The ability of a test to measure the notion or concept it assesses is known as construct validity.
This sort of validation is particularly useful when a notion or concept is impossible to measure
directly. Construct validity, to put it simply, is the degree to which a test lives up to its claims.
Construct validity evaluates whether your hypothesis is supported by the behaviour of the
variables you are testing for. This type of testing is typically validated by examining the
correlation between the two measurements by comparing the test to other comparable test
features.
Types of construct validity:

Convergent validity
The degree to which two measures of constructs you hypothesize are related is known as
convergent validity. By comparing test findings with those of another test that is intended to
assess the same construct, you can examine convergent validity.

Discriminant validity
However, when two measures of unrelated constructs that ought to be unconnected are
unrelated, this is known as discriminant validity. The results for convergent validity and
discriminant validity are obtained in the same manner.

Meaning of Validity
Another essential characteristic of a scientific tool is validity. The term 'validity' denotes
truthfulness or fidelity. Thus, validity is the degree to which a test measures what it is supposed
to measure. Validity is not the test's self-correlation but correlation with some independent
external criteria, which are regarded by experts as the best measure of the trait or ability the test
is designed to measure.

A number of different authors have defined validity in slightly different terms. Anastasi (1968,
99) has stated, "The validity of a test concerns what the test measures and how well it does so."
Lindquist (1951, 213) has defined the validity of a test as "the accuracy with which it measures
that which is intended to measure or as the degree to which it approaches infallibility in
measuring what it purports to measure." Kaplan and Saccuzzo (2001) have defined validity as
"the agreement between a test score or measure and the quantity it is believed to measure."
Restating the definition of validity in the highly influential Standards for Educational and
Psychological Testing (AERA, APA, & NCME 1999) can be stated as follows: "A test is valid to
the extent that references made from it are appropriate, meaningful, and useful." These
references imply that in establishing the validity of a test, the test should be correlated with
some hypothetical ideal independent measure or criteria. The correlation coefficient computed
between the test and ideal measures or criteria is called the validity coefficient. 'Independent
criteria' implies some measurement of the characteristic or the range of characteristics (external
to the test) asserted by the test to be the target of measure.

In a broad sense, validity is connected with generalizability. When a test is a valid test, it means
that its conclusion can be generalized in terms of the general population. When the independent
criterion has been established and when both the test and the criterion are reliable, the
correlation between the test and the criterion can be used as evidence of the validity of the test
safely. Although validity defined the meaning of tests and measures, the term itself started
losing its meaning. But in 1985, the joint committee of the American Educational Research
Association (AERA), the American Psychological Association (APA), and the National Council
on Measurement in Education (NCME) published a very important booklet for psychological test
guidelines titled Standards for Educational and Psychological Testing, which was revised in
1999. This joint committee, by rejecting the numerous possible definitions of validity, suggested
that validity is nothing but plain evidence for inferences drawn about a test score. These
evidences can be content-related, criterion-related, or construct-related. In this sense, validity
actually refers to evidence in favor of what can be said on the basis of the test scores and not
the tests themselves (Landy 1986).

Factors Influencing Validity

Several factors influence the validity of a test :

1. Length of the Test – The longer the test, the more valid and reliable it becomes. The
lengthening of the test or repeated administration of the same test increases the reliability,
and since the validity in a homogenous test is dependent upon reliability, it also increases
the validity of the test.

2. Range of Validity – The range of ability of the samples used also influences the
validity. If the range of ability is limited for the subjects as such the wider range of scores is
not possible, the validity coefficient will be low and coefficient would be enhanced if the
subjects have a wider range of ability so that a wider range of scores is obtained.

3. Ambiguous Directions – If the directions of the test are ambiguous and differently
interpreted by different examinees, such items will likely encourage guessing on the part of
the examinees. Thus, it will lower the validity of the test.

4. Socio-cultural Differences – A particular test developed in one culture may not be


valid for another culture because of the differences in socio-economic status, sex ratios,
social norms, etc. In a cross-cultural test, the validity is unaffected.

5. Addition of Inappropriate Items – When inappropriate items whose difficulty values


differ widely from the original items are added to the test, they are likely to lower both the
reliability and the validity of the test.

Difference between Reliability and Validity

RELIABILITY VALIDITY

1. Reliability refers to the consistency 1. Validity refers to the accuracy of a


of a measure. measure.

2. It examines whether the results can 2. It examines whether the results


be reproduced under the same accurately represent what they are
conditions. supposed to measure.

3. Higher reliability requires items of 3. Higher validity requires items of different


equal difficulty and higher difficulty values and low intercorrelation
intercorrelations between the items. between items.

Norms
Norms are standard reference points or benchmarks that are established based on data
collected from a representative group. These are used to interpret individual results or
performance. They interpret test scores by comparing them to the average performance of
the group. Common types of norms include age norms, grade norms, percentile norms,
and standard score norms. These benchmarks help in understanding individual test
results in context. Without norms, test scores lack meaningful interpretation.
Norms might be defined as the average performance on a particular test made by a
standardization sample. A standardization sample is one that is representative of the population
and takes the test for the express purpose of providing data for comparison and interpretation of
the test scores. Since psychological tests rarely provide absolute, ration measures of
psychological attributes. So, raw scores might not be useful to measure such attributes. The
way we can measure attributes in a useful manner is by comparing one person's performance
with another.

Norm-based Interpretation - When a person's test score is interpreted by comparing that score
with scores of several other people.

STEPS IN DEVELOPING NORMS


For norms to serve as a useful comparative device, the following steps must be taken into
consideration-

Defining the target population: A test is administered to a particular group. The composition
of this target group (normative group) is determined by the intended use of the test. Hence, the
first step is to define the composition of the target group. For Example, the Test of English as a
Foreign Language (TOEFL) is intended for students whose native tongue is not English but who
plan on studying abroad where the medium of instruction is in English. Thus, for TOEFL the
target population will consist of those students.

Selecting the sample from the target population: After defining the target population, the
next step is to select a representative sample. To achieve this, a cross-sectional representation
of the population is used, ensuring the sample reflects the diversity of the larger group. To
ensure this representative character of the sample, various sampling techniques are employed.
While random sampling is ideal for larger samples, it is often impractical, so cluster sampling or
its variations are typically employed as more feasible alternatives.

Standardizing conditions for proper implementation of the test: Standardizing the


conditions for test administration is crucial to ensure valid and reliable comparisons of individual
scores to the test norms. This requires controlling factors such as sound, lighting, temperature,
and ventilation, keeping them consistent across all groups. Additionally, elements like test
timing, security, adherence to the test manual, and ensuring that examinees work on the correct
sections must be strictly followed. Without these standardized procedures, the norms cannot
effectively serve as a valid comparative tool.

TYPES OF NORMS
Different norms have been classified corresponding to the four commonly derived scores. The
four types of norms commonly used in psychological and educational testing are age-equivalent
norms, grade-equivalent norms, percentile norms, and standard score norms.

Age-equivalent norms: These norms represent the average performance of individuals within
a specific age group on a given test or measurement. These norms are useful for assessing
traits that show systematic growth with age, such as height, weight, and cognitive abilities
during childhood. However, they have limitations, as growth rates vary across different ages and
traits. For example, the development of general intelligence is faster in early childhood but slows
down significantly after adolescence, making it difficult to maintain uniform standards across all
ages. Additionally, some traits, like vision acuity, do not exhibit progressive change, rendering
age norms ineffective for such measures.

Grade-equivalent norms: These norms indicate the average performance of students in a


particular grade. They are derived from the test scores of representative samples across
different grades. For instance, if the average score on an arithmetic test for sixth-grade students
is 30, this score becomes the grade norm for that grade level. While grade-equivalent norms are
helpful for educational assessments, they have limitations, such as the inability to compare
performance across different subjects because learning in areas like social studies may be
influenced by life experiences, unlike subjects such as arithmetic, which rely more on formal
instruction.

Percentile norms: These norms are widely used in psychological and educational
assessments. They rank individuals based on the percentage of people in the standardization
sample who scored at or below a particular raw score. For example, if a student scores at the
70th percentile on a test, it means they performed better than 70% of the sample. Percentile
norms are easy to understand and interpret, but they can be misleading because the units are
not equal across the scale. Small raw score differences in the middle of the distribution can
result in large percentile changes, while large raw score differences at the extremes may result
in minor percentile shifts.

Standard Score Norms: These types of norms are based on standardized scores, such as z-
scores, which have a fixed mean and standard deviation. These norms are advantageous
because they maintain equal units of measurement across the entire scale, unlike percentile
norms. A z-score, for example, shows how many standard deviations a score is above or below
the mean, allowing for accurate comparisons across different tests or distributions. This makes
standard score norms particularly useful in psychological testing, where comparing performance
across diverse measures is often necessary.

PRACTICALITY/USABILITY
A test must also be practicable/usable from the point of view of the time taken in its completion,
length, scoring, etc. In order words, the test should not be lengthy or complex and the scoring
method must not be difficult and it should not require highly specialized training to administer. In
addition the test should be cost-effective and efficient, ensuring it is accessible and feasible for
widespread use.

TYPES OF TESTS

Tests can be broadly classified as follows:


On the basis of administration

1. Group tests:
Group tests are largely pencil-and-paper measures suitable to the testing of large groups of
persons at the same time.
Example:Multidimensional aptitude battery-II(MAB-II),Raven’s progressive matricesMultilevel
battery:the cognitive abilities test(CogAT).

2. Individual tests: Individual tests are instruments that by their design and purpose must be
administered one on one.
Example::Stanford-Binet intelligence scale,Detroit tests of learning aptitude-4(DTLA-
4),Cognitive assessment system-II(CAS-II).

An important advantage of individual tests is that the examiner can gauge the level of motivation
of the subject and assess the relevance of other factors (e.g., impulsiveness or anxiety) on the
test results.

On the basis of the criterion of types of response and Scoring

3. Objective tests: Objective tests are those whose items are scored by competent examiners or
observers in such a way that no scope for subjective judgement or opinion exists and thus, the
scoring remains unambiguous. Tests having multiple choices, true-false and matching items are
usually called objective tests. In such tests, the problem as well as its answer is given along with
the distractor. The problem is known as the stem of the item. A distractor answer is one which is
similar to the correct answer but is not actually the correct one. Such tests are also known as
new- type tests or limited-answer tests.
Example: Minnesota Multiphasic Personality Inventory (MMPI)

4.Subjective tests: Subjective tests are tests whose items are scored by the competent
examiners or observers in a way in which there exists some scope for subjective judgement and
opinion. As a consequence some elements or vagueness and ambiguity remain in their scoring.
These are also called essay tests. They intend to assess an examinee's ability to organize a
comprehensive answer, recall and select important information, and present the same logically
and effectively. It is also called free answer tests.
Example: Thematic Apperception Test (TAT)

On the basis of time


5. Power tests: A Power test is one which has a generous time limit so that most examinees are
able to attempt every item. Usually such tests have items which are generally arranged in
increasing order of difficulties. Most of the intelligence tests and aptitude tests belong to the
category of power tests. In fact, power tests demonstrate how much knowledge or information
the examinees have.
Example: Raven’s Progressive Matrices (RPM)

6.Speed tests: Speed tests have severe time limits but the items are comparatively easy and
the difficulties involved therein are more or less of the same degree. Here, very few examinees
are supposed to make errors. It reveals how rapidly, i.e. with what speed the examinees can
respond within a given time limit. Most of the clerical aptitude tests belong to this very category.
Example: Clerical Aptitude Test

On the basis of content of items


7. Verbal test: a verbal test is one whose items emphasize reading, writing and oral expression
as the primary mode of communication. Herein, instructions are printed or written. These are
read by the examiners and, accordingly, items are answered. It is also called paper-pencil tests
because the examinee has to write on a piece of paper while answering test items.
Example: Jalota group general intelligence test, mehta group intelligence test

8. Non-verbal tests: non-verbal tests are those that emphasize but don't altogether eliminate the
role of language by using symbolic materials like pictures, figures, etc. such tests use the
language in instruction but in items, they don't use language . Test items present the problems
with the help of figures and symbols and are commonly used with young children as an attempt
to assess the nonverbal aspects of intelligence such as spatial perception.
Example: Raven progressive matrices

9. Performance test: those tests that require the examinee to perform a task rather than answer
some questions is known as a performance test. Such tests prohibit the use of language in
items. Occasionally , oral language is used to give instructions, or the instructions may also be
given through gestures and pantomime. These tests are usually administered individually so
that the examiner can count the errors committed by the examinee or the student can assess
how long it takes him to complete a given task. Hence, performance tests emphasize on the
examiner's ability to perform a task rather than answer some questions.
Example: Kohs Block Design Test

10. Non Language tests: non language tests are those which don't depend upon any form of
written, spoken or reading communication. Such tests remain completely independent of the
ability to use language in any way. Instructions are usually given through gestures or pantomine
and the examiners respond by pointing at or manipulating objects such as pictures, blocks,
puzzles etc. Such tests are usually administered to those persons or children who can't
communicate in any form of ordinary language.
Example: The Raven's Progressive Matrice
11. Neuropsychological test: Neuropsychological tests are the tests which are used in the
assessment of persons with known or suspected brain dysfunctioning. Achievement tests
assess what the persons have acquired in the given area as a function of some training or
learning.
Example: The Wisconsin Card Sorting Test (WCST)
On the basis of the criterion of Purpose or Objective
1.INTELLIGENCE TESTS
Intelligence tests were originally designed to sample a broad assortment of skills in order to
estimate the individual’s general intellectual level.
The Binet-Simon scales were successful in part because they incorporated heterogeneous
tasks, including word definitions, memory for designs, comprehension questions, and spatial
visualization tasks.
In general, the term intelligence test refers to a test that yields an overall summary score based
on results from a heterogeneous sample of items.

2.APTITUDE TESTS
Aptitude tests measure one or more clearly defined and relatively homogeneous segments of
ability. Such tests come in two varieties: single aptitude tests (A single aptitude test appraises
only one ability) and multiple aptitude test batteries (the multiple aptitude test battery provides a
profile of scores for a number of aptitudes).
Aptitude tests are often used to predict success in an occupation, training course, or educational
endeavor.
For example, the Seashore Measures of Musical Talents (Seashore, 1938), a series of tests
covering pitch, loudness, rhythm, time, timbre, and tonal memory, can be used to identify
children with potential talent in music.
The most common use of aptitude tests is to determine college admissions. SAT (Scholastic
Assessment Test) of the College Entrance Examination Board contains a Verbal section
stressing word knowledge and reading comprehension; a Mathematics section stressing
algebra, geometry, and insightful reasoning; and a Writing section. In effect, colleges that
require certain minimum scores on the SAT for admission are using the test to predict academic
success

3.ACHIEVEMENT TESTS
Achievement tests measure a person’s degree of learning, success, or accomplishment in a
subject matter. The implicit assumption of most achievement tests is that the schools have
taught the sub- ject matter directly. The purpose of the test is then to determine how much of
the material the subject has absorbed or mastered. Achievement tests commonly have several
subtests, such as reading, mathematics, language, science, and social studies. The distinction
between aptitude and achieve- ment tests is more a matter of use than content

4.CREATIVITY TESTS
Creativity tests assess a subject’s ability to produce new ideas, insights, or artistic creations that
are accepted as being of social, aesthetic, or scientific value. Thus, measures of creativity
emphasize novelty and Originality in the solution of fuzzy problems or production of artistic
ideas.

5.PERSONALITY TESTS
Personality tests measure the traits, qualities, or behaviors that determine a person’s
individuality; this information helps predict future behavior. These tests come in several different
varieties, including checklists, inventories, and projective techniques such as sentence
completions and inkblots.
Example: An Adjective Checklist

6. INTEREST INVENTORY
Interest inventories measure an individual’s preference for certain activities or topics and
thereby help determine occupational choice. These tests are based on the explicit assumption
that interest patterns determine and, therefore, also predict job satisfaction.
Example: If the examinee has the same interests as successful and satisfied accountants, it is
thought likely that he or she would enjoy the work of an accountant. The assumption that
interest patterns predict job satisfaction is largely borne out by empirical studies.

7.BEHAVIORAL PROCEDURES
Many kinds of behavioral procedures are available for assessing the antecedents and
consequences of behavior, including checklists, rating scales, interviews, and structured
observations. These methods share a common assumption that behavior is best understood in
terms of clearly defined characteristics such as frequency, duration, antecedents, and
consequences. Behavioral procedures tend to be highly pragmatic in that they are usually
interwoven with treatment approaches.
Example: A structured behavioral interview, where a job candidate is asked detailed questions
about past situations to assess how they behaved in specific work-related scenarios, revealing
their problem-solving skills, decision-making abilities, and response to pressure; questions like
“Tell me about a time you had to deal with a difficult client” are typical examples of this type of
test.

8.NEUROPSYCHOLOGICAL TESTS
Neuropsychological tests are used in the assessment of persons with known brain dysfunction.
Neuropsychology is the study of brain-behavior relation years, neuropsychologists have
discovered that certain tests and procedure to the effects of brain damage.
Neuropsychologists use these specialized tests and procedures to make inferences about the
locus, extent, and consequences of brain damage.
Example: Testing one’s intelligence can give a clue to whether there is a problem on brain-
behavior connection. The Wechsler Scale are the tests most often used to determine the level
of intelligence.

USES OF PSYCHOLOGICAL TESTS


Psychological tests are used for a variety of purposes, mainly to make decisions about people.
However, in addition to simple decision making, there are five major uses of tests.

Classification
Classification involves assigning a person to one category rather than another. This often leads
to different treatment of some kind, like getting access to a specific college or job.
Placement involves sorting people into different programs based on their needs or skills. For
instance, a university might use a maths placement test to decide whether students should take
calculus, algebra or remedial classes.
Screening uses quick tests to identify people who might have special characteristics or needs.
These tests may misclassify some people and should be followed up with more comprehensive
tests.
Certification involves a pass/fail test that confers certain privileges when passed, such as the
right to practice psychology or drive a car.
Selection is similar to certification, in that it involves a pass/fail test that confers privileges like
getting into a university or gaining employment.

Diagnosis and Treatment Planning


Psychological tests can help determine the nature and source of a person's abnormal behaviour
and classify it within a diagnostic system. Diagnosis is more than just assigning a label and
should also convey information about a person's strengths, weaknesses, and best choices for
treatment. For example, it's more helpful to know that a child has a reading comprehension
problem and needs help with phonics, than just knowing that they have a learning disability.
Tests like the MMPI can help to increase the efficiency of psychiatric diagnoses.

Self-Knowledge
Feedback from psychological tests can sometimes lead to people changing their career paths or
other aspects of their lives. However, in most cases, people already know what the test results
will show. For example, a high-achieving college student will not be surprised to learn that they
have a high IQ.

Program Evaluation
Psychological tests can be used to evaluate the effectiveness of social and educational
programs. For example, tests can provide an objective way to assess whether programs like
Head Start are improving children’s scholastic performance. In general, these tests show that
children in Head Start make gains in IQ and academic achievement, but these benefits tend to
decrease over time.

Research
Psychological tests also play a role in both applied and theoretical behavioral research. For
example, researchers might use psychological tests to investigate if low-level lead absorption
causes behavioural issues in children.
These applications of psychological testing can overlap. For example, a test that helps with
psychiatric diagnosis can also give an individual a better understanding of themselves.
Additionally, psychological tests are seen as important and their validity is respected, as seen in
debates and arguments around testing-based research.

MERITS AND DEMERITS


Merits of Psychological Testing

1. Provides an Objective and Systematic Assessment**


Psychological tests offer a structured way to measure mental abilities, personality traits, and
emotional states. Unlike informal judgments or personal opinions, these tests follow
standardized procedures, making them more objective and reliable.
For instance, in hiring, an aptitude test offers a more consistent evaluation of a candidate’s skills
than a subjective interview.

2. Ensures Reliability and Validity


A well-developed psychological test is both **reliable** and **valid**. Reliability means that if a
person takes the test multiple times under similar conditions, the results should be consistent.
Validity ensures that the test accurately measures what it claims to assess.
For example, an intelligence test should measure problem-solving skills rather than just memory
or general knowledge.

3. Helps in Predicting Future Outcomes


Psychological tests can provide insights into a person’s potential, helping predict future success
in academics, careers, or even mental health outcomes.
- Aptitude tests guide students in choosing careers that align with their skills.
- IQ tests can estimate how well a person might perform in academic settings.
- Personality tests can help employers understand how a candidate may fit into a team.

4. Plays a Key Role in Diagnosis and Treatment


In the field of mental health, psychological tests help professionals diagnose conditions like
anxiety, depression, learning disabilities, and personality disorders. These tests provide a
scientific approach to identifying problems rather than relying solely on observations.
For example, a child struggling in school might take a test to determine if they have dyslexia or
ADHD, leading to proper intervention.

5. Saves Time and Improves Efficiency


Psychological tests allow for quick assessments, especially when dealing with large groups.
Instead of spending hours on interviews or observations, structured tests provide rapid results
that can guide decision-making. This is why businesses use psychometric tests during
recruitment to efficiently shortlist candidates based on skills and personality traits.

6. Facilitates Comparisons Across Individuals and Groups


Because psychological tests are standardized, they allow for comparisons between different
individuals or groups. This is particularly useful in education, research, and corporate settings.
For example:
- Schools use standardized tests to compare students’ academic performance.
- Companies compare job applicants’ test scores to identify the best candidates.
- Researchers analyze personality traits across different cultures to study behavioral trends.

Demerits of Psychological Testing

1. Cultural and Linguistic Bias


Many psychological tests are designed within a specific cultural or linguistic framework. When
used in different cultures or languages, these tests may not be as fair or accurate.
For example, an intelligence test developed in the United States might contain references
unfamiliar to someone from a rural area in another country, affecting their performance.
Similarly, if a test is not available in a person’s native language, their understanding may be
affected, leading to inaccurate results.

2. Cannot Capture the Full Complexity of Human Behavior


Although psychological tests provide useful insights, they cannot fully capture all aspects of
human personality, emotions, and behavior.
For example, intelligence tests focus on logical reasoning and problem-solving but may overlook
creativity, adaptability, or emotional intelligence—traits that are equally important in real life.

3. Risk of Manipulated Responses


Certain psychological tests, particularly self-report assessments, rely on individuals providing
honest answers. However, people may alter their responses to appear more favorable,
especially in job interviews or clinical settings.
For instance:
- A job applicant might exaggerate leadership skills on a personality test.
- A patient might underreport symptoms of anxiety to avoid stigma.
This can result in misleading conclusions and incorrect decisions.

4. Test Anxiety Can Affect Results


Many individuals experience anxiety when taking tests, which can negatively impact their
performance. This is particularly concerning in tests that measure intelligence, aptitude, or
personality. A highly capable student may score poorly on an IQ test due to nervousness, not
because they lack intelligence. This reduces the test’s effectiveness in accurately assessing
potential.

5.Ethical Concerns and Potential for Misuse


Psychological test results contain sensitive information, and if misused, they can lead to ethical
and legal issues.
- Employment Discrimination: Employers may unfairly reject candidates based on test scores
rather than considering their overall abilities.
- Unfair Labeling: A low IQ score might lead to a child being labeled as "slow," affecting their
confidence and future opportunities.
- Privacy Issues: If test results are leaked or misused, individuals may face discrimination or
stigma.
Ethical guidelines must be followed to prevent the misuse of psychological tests.

6.Overemphasis on Numerical Scores**


Psychological tests often assign scores to intelligence, personality, or aptitude, but numbers
alone cannot fully define a person’s abilities.
For example, an IQ score might suggest a person is of average intelligence, but it does not
consider creativity, emotional intelligence, or problem-solving abilities in real-life situations.
Similarly, a personality test may categorize someone as introverted, but human behavior is
complex and can change depending on the situation.

ETHICAL ISSUES IN PSYCHOLOGICAL TESTING


To ensure that psychological tests are used and applied appropriately the American
Psychological Association has adopted a set of rules and standards which have undergone
continual review and refinement. The Ethical Principles of Psychologists and Code of Conduct
(APA, 1992) has a preamble and six general principles to guide psychologists towards the
highest ideals in their profession. In addition, it also provides eight ethical standards with
enforceable rules. Some of the major ethical issues concerning psychological testing have been
mentioned as follows.

Issue of Human Rights - Several human rights are recognised in the field of psychological
testing. One of them is the right to not be tested, that is, people who do not want to be subjected
to psychological testing cannot be forced to do so. Similarly, subjects of psychological testing
have the right to know the results of such test, its interpretations and any decision that may
affect them. Other human rights such as the right to know who will have access to the data and
the right to confidentiality are also popular.

Issue of Labelling - An individual is given a label or is diagnosed as having a particular


psychiatric disorder on the basis of psychological testing. Such labelling can have certain
negative outcomes. It can stigmatize a person and affect access to help. It can also cause a
person to become passive and have a lowered incentive towards altering the negative
conditions surrounding him/her. These impacts can make treatment difficult. Therefore, a
person has a right not to be labelled.

Issues of Invasion of Privacy - Sometimes subjects of psychological testing feel that their
privacy has been invaded. This issue was studied by Dahlstrom. He suggested that the notion
of invasion of privacy emerges due to misunderstanding as psychological tests have fixed aims
and cannot invade a person's privacy. He also suggested that the concept of invasion of privacy
is very ambiguous. It only happens when certain information regarding a person is used
inappropriately. Because psychologists are bound by ethics and legalities, they don't reveal any
more information than is needed. The ethics code of APA endorses Confidentiality, which
dictates that personal information acquired by a psychologist is revealed to others only with the
consent of the person.

Issue of Divided Loyalties - It highlights how psychologists often experience conflicts between
their duty to their employer and their responsibility to individuals' welfare. For example, an
industrial psychologist may need to identify accident-prone employees for workplace safety but
must also respect individuals' rights and privacy. This conflict arises when a psychologist must
maintain test security while ensuring fairness in decision-making. If they disclose test details to
one person, others may use this information to manipulate results, compromising the test's
integrity. This creates a situation where the psychologist is caught between two opposing ethical
principles.

Responsibility of test constructors and test users - According to the latest standards for test use,
test constructors are required to provide a test manual that clearly outlines the proper
application of the test. This manual should include information on reliability, validity, and norms,
as well as detailed guidelines on scoring and administration procedures. On the other hand, test
users are responsible for knowing the reason for and the implications of using the test. Test
users should have sufficient knowledge of test construction, supporting research, and
psychometric properties.

EXTRA MATERIAL (NOT TO BE WRITTEN)

IMPROVING RELIABILITY

Improving reliability in psychological testing is crucial to ensure consistent and accurate


measurement of psychological constructs. Here are a few ways to enhance reliability:

1. Standardization of Procedures

Standardization involves administering and scoring the test in a consistent manner across
different situations and populations.

Implementation: Create a detailed manual that outlines every step of the test administration
process. This includes the instructions given to participants, the environmental conditions under
which the test should be conducted, and the specific ways in which responses should be
recorded and scored.
2. Clear and Precise Test Instructions

Ensuring that test instructions are unambiguous and easy to understand.

Implementation: Pilot the instructions with a small group to identify any potential
misunderstandings. Revise the instructions based on feedback to make them as clear as
possible. Consider using visual aids or demonstrations if necessary.

3. Training of Test Administrators

Proper training ensures that those who administer the test do so in a consistent and
standardized manner.

Implementation: Develop comprehensive training programs that include both theoretical


knowledge and practical demonstrations. Conduct regular refresher courses to maintain high
standards.

4. Use of Reliable Measurement Tools

Employing tools that have been scientifically validated and have shown high reliability in
previous research.

Implementation: Review the existing literature to identify the most reliable tools for the specific
construct being measured. Use these tools consistently across different studies and settings.

5. Conducting Pilot Studies

Pre-testing the psychological test on a small sample to identify potential issues before full-scale
administration.

Implementation: Analyze the pilot test results to detect any inconsistencies or problems with the
test items. Make necessary adjustments to improve the test's reliability.

6. Consistency in Testing Conditions

Maintaining similar testing environments and conditions to minimize external factors that could
affect test results.

Implementation: Control variables such as lighting, noise levels, and seating arrangements.
Ensure that all participants are tested under similar conditions to reduce variability.
7. Regular Calibration and Maintenance

Ensuring that any instruments or tools used in testing are regularly calibrated and maintained for
accuracy.

Implementation: Schedule regular maintenance and calibration checks for all testing equipment.
Keep detailed records of these checks to ensure consistency.

8. Test-Retest Reliability

Assessing the consistency of test results over time.

Implementation: Administer the same test to the same group of people at two different points in
time and calculate the correlation between the two sets of scores. High correlation indicates
high test-retest reliability.

9. Inter-Rater Reliability

Ensuring that different raters or scorers provide consistent ratings.

Implementation: Develop clear scoring criteria and train raters thoroughly. Use multiple raters
and calculate the agreement between their scores.

10. Internal Consistency

Measuring the consistency of results across items within a test.

Implementation: Calculate internal consistency metrics such as Cronbach's alpha. Modify or


remove items that do not correlate well with the overall test score.

Validity has five important properties:


1. Validity is a relative term. A test is not valid in general. It is valid for a particular purpose.
For example, a test of statistical ability will be valid only to measure statistical ability
because it is used only for measuring that ability. It will be futile for other uses, such as
measurement of the knowledge of geography, history, etc. It is quite clear from this
explanation that technically one validates not a measuring instrument but some uses to
which the test is applied.
2. Validity is not a fixed characteristic of the test because validation is not a fixed process
but a continuous process. With the emergence of new concepts and the formation of
new meanings, the old contents of the test become meaningless. Hence, they must be
altered drastically in the light of the new meanings. Hence, the validity of a test
calculated at the initial stage becomes less trustworthy, and hence, the test constructor
must calculate a new validity of the test in the light of new meanings assigned.
3. Validity, as with reliability, is a matter of degree and not an all-or-none characteristic. A
test evolved for the measurement of a particular trait or ability cannot be said to be
absolutely valid or invalid.
4. Validity is a unitary construct. In the two most important revisions done in 1999 of the
Standard of Educational and Psychological Testing by the American Educational
Research Association (AERA), the American Psychological Association (APA), and the
National Council on Measurement in Education (NCME), the assumption that there are
different types of validity has been dropped. Instead, validity has been treated as a
unitary construct based on different types of evidence.
5. Validity is a general judgment of evaluation. That is, a judgment of the degree to which
the interpretation and uses of the test scores are supported by evidence for them and in
terms of consequences of the interpretations and uses. The Standards for Educational
and Psychological Testing have described five sources of evidence for determining the
validity of a specific use or interpretation. The sources of evidence are (a) test content,
(b) response processes, (c) internal structure, (d) relations to other variables, and (e) the
consequences of testing. It suggests that the validity may involve consideration of the
content being measured, the response processes of the test-takers, the relation of the
individual items to the test scores, the relation of the performance to other measures, as
well as the consequences of the use of the test.

You might also like