Types of reliability
'Reliability' of any research is the degree to which it gives an accurate score across a range of
measurement. It can thus be viewed as being 'repeatably or 'consistency'. In summary:
Inter-rater: Different people, same test.
Test-retest: Same people, different times.
Parallel-forms: Different people, same time, different test.
Internal consistency: Different questions, same construct.
(1) Inter-Rater Reliability (across different people & inter-observer reliability or inter-
coder reliability) - when multiple people are giving assessments of some kind or are the
subjects of some test, then similar people should lead to the same resulting scores. It can be
used to calibrate people, for example those being used as observers in an experiment.
Example: In a test scenario, an IQ test applied to several
people with a true score of 120 should result in a score of
120 for everyone. In practice, there will be usually be
some variation between people.
(2) Test-Retest Reliability (across time) - an assessment or test of a person should give the
same results whenever you apply the test. This method is particularly used in experiments
that use a no-treatment control group that is measure pre-test and post-test.
Example: In the development of national school tests, a class of
children are given several tests that are intended to
assess the same abilities. A week and a month later, they
are given the same tests. With allowances for learning, the
variation in the test and retest results are used to assess
which tests have better test-retest reliability.
(3) Parallel-Forms Reliability (evaluates
different questions and question sets that seek
to assess the same construct) - evaluation may be
done in combination with other methods, such
as Split-half, which divides items that measure
the same construct into two tests and applies
them to the same group of people.
Example: An experimenter develops a large set of questions. They
split these into two and administer them each to a
randomly-selected half of a target sample.
(4) Internal Consistency Reliability (evaluates
individual questions in comparison with one
another for their ability to give consistently
appropriate results) - when asking questions in
research, the purpose is to assess the response
against a given construct or idea. Different
questions that test the same construct should
give consistent results.
(a) Average inter-item correlation compares correlations between all pairs of questions
that test the same construct by calculating the mean of all paired correlations.
Average item total correlation takes the average inter-item correlations and calculates
a total score for each item, then averages these.
(b) Split-half correlation divides items that measure the same construct into two tests,
which are applied to the same group of people, then calculates the correlation
between the two total scores.
Cronbach's alpha calculates an equivalent to the average of all possible split-half
correlations and is calculated thus:
a = (N . r-bar) / (1 + (N-1) . r-bar)
Where N is the number of components,
and r-bar is the average of all Pearson correlation coefficients