Survey Data Quality Methods for ISSP and DATIS

Ioannis Andreadis
Aristotle University of Thessaloniki
WAPOR 77th and WAPOR Asia Pacific 7th Joint Annual Conference
28-31 July 2024 – Sungkyunkwan University, Seoul, Republic of Korea
Survey Data Quality Methods for
ISSP and DATIS

Outline
 Methods applicable to both:
 pilot surveys (for detecting issues in
questionnaires) and
 main survey data (for uncovering
inattentive respondents)
 Focusing on the findings from the
implementation of the ISSP 2025 Pilot
Survey and the first phase of the ISSP
2024 Main Survey in Greece
2

Methods for pilot surveys
 Focusing on item non-response rates
 A spike in non-responses for a
particular item is a flag for potential
problems (e.g. participants’ confusion,
privacy concerns, or technical
glitches).
 Combining item non-response rates with
participant feedback can provide
insightful information
3

Evaluation of the findings
 Sometimes increased non-response
rates correspond to errors in the coding
of the question (e.g. missing answer
options).
 In other cases, there are no obvious
reasons. This is where participants’
feedback can be very handy.
5

Respondents’ feedback
 At the end of our questionnaires, we
always include a large text field where
survey participants may leave their
feedback about the questionnaire and
their experience.
 One of the participants’ comments was
the following: “I am self-employed and
for some of the questions it was difficult
for me, if not impossible, to select
answers”.
6

Reviewing the questionnaire from
another perspective
 This comment motivated me to go again
through the questionnaire and try to
answer it as a self-employed
respondent. By doing so, I have
identified some items that would not fit
very well to self-employed respondents,
e.g. the item: “A lack of sufficient digital
and computer skills harms my chances
of being promoted”. The idea of a
promotion would not make much sense
for most self-employed workers.
7

Additional analysis
 To identify the items that may not fit
very well to self-employed workers:
 I have calculated the rate of “No
answer” and “Can’t choose”
responses for all items separately for
employees and self-employed.
 The items with the highest differences
(>2.5%) between self-employed (red
dots) and employees (grey dots) are
displayed in the next slide
8

Items with large differences
in non-response rates
9
My opportunities for advancement are high
I am willing to work harder than I have to help ...
I would turn down another job that offered more pay ...
A lack of digital skills harms my chances of being promoted
2% 4% 6% 8%
No response

Reasons behind web survey
dropouts (pilot and main)
 While item non-response clearly
indicates that something is wrong with
the specific item, dropouts may result
from more general problems: survey
length, low interest, survey flow or
external distractions
 On the other hand, some items may
have similar issues: complicated
questions, unclear instructions, requiring
extensive cognitive effort, too personal
or sensitive and technical issues
11

Findings from the first phase of the
ISSP 2024 main survey
 Although we cannot not know if a
dropout was a result of a more general
problem of the survey, or a problem
related to a specific item, we need to
track the points with the most frequent
dropouts and try to identify the problem.
 In the next two slides, I present the two
most frequent (after the first pages
about users’ consent and eligibility )
dropout points of the first phase of the
ISSP 2024 main survey
12

Arrays with a lot of text → dropouts
13

Arrays with a lot of text → dropouts
14

For main surveys
The R package “Survey Data Quality” is being developed at:
https://github.com/andreadis-ioannis/SurveyDataQuality
 In addition to the data quality functions, the R package
contains a data file with responses from students to the
ISSP 2020 questionnaire which is used as an example in
the help files of the package
 In the following slides, I apply the methods on students’
responses to the ISSP pilot 2025 questionnaire
15

Item non-response (skipping)
 The focus now is on the respondents instead of the items
 We can calculate the ratio of missing answers for each
respondent.
 Function: flag_missing(data, Q1.Q1a.:Q7, ratio=0.5))
 In our example, three students have not answered to
more than 50% of the checked items
16

Midpoints responses in Likert-type
scale items: (e.g., “neither/nor”)
 Respondents may choose mid-point responses when they do
not process a question with the required cognitive effort
 We can calculate the ratio flag_midpoints(data,
Q8.Q8a.:Q8.Q8f., midpoint=3, ratio=0.5))
 a. My job is secure, b. My income is high, c. My
opportunities for advancement are high, d. My job is
interesting, e. In my job I can decide the times or days of
work, f. In my job I can work remotely, e.g., from home.
 In our example, one student has selected the “Neither …
nor” answer in more than 50% of the aforementioned
items
17

Item Response times
 We can compare the time spent on questionnaire items
with the minimum time needed to read and answer an
attitudinal question given the length of the question text
 Function flag_times(data, "question-chars.csv", 0.4))
●
The csv file contains the minimum time needed to read
and answer each question
 In our example, six student have spent less than the time
needed to scan the question texts in more than 40% of
the checked items
18

Example of speedy
 Thinking about the impact that the use of machines,
computer programs or artificial intelligence (AI) will have on
jobs of people.Which of the following comes closest to your
view?
●
Many more jobs will be created than lost
●
More jobs will be created than lost
●
A similar number of jobs will be created and lost
●
More jobs will be lost than created
●
Many more jobs will be lost than created
●
Can’t choose
 (MRT circa 10 seconds, one student have spent 3 seconds
only!
19

Item vs total interview times
 Item response times are better than the total time used
to complete the survey (e.g. a lot of extremely short
item response times + a long break gives a “normal”
survey duration
 In addition, be very careful with questionnaires that have
many conditional questions (see ISSP 2025)
 In this case we need to work with each group
separately because …
20

Mean Interview Time by Working
Status
21
Working Status (works as a filter question – each group is
presented a different subset of the questionnaire – these sets do
not have the same number of questions)
Mean
Interview
time
I am currently in paid work 707
I am currently not in paid work but I had paid work in the past 373
I have never had paid work 285
But even when all respondents are presented the same questions, the
total interview duration may be misleading...

Total interview time vs item
response times for Workers
22
0
200
400
0 10 20 30
Question order
Cummulative
time
case
11 speedy
14 speedy
no speedy

Conclusions
 For main surveys:
 Use the R package Survey Data Quality
 Use “item response times” instead of
interview duration. If this is not possible, go
with the most granular information you can
have (e.g. use page timestamps)
 For pilot surveys:
 Combining item non-response rates with
participant feedback can provide insightful
information
23

Thank you! Questions? Comments?
Project site: https://www.datis.gr/
Twitter/X: @andreadis_i, @Datis_project
This project is carried out within the framework of the
National Recovery and Resilience Plan Greece 2.0, funded by
the European Union – NextGenerationEU (Implementation
body: HFRI).
24

Survey Data Quality Methods for ISSP and DATIS

More Related Content

Similar to Survey Data Quality Methods for ISSP and DATIS

More from Ioannis Andreadis

Recently uploaded

Survey Data Quality Methods for ISSP and DATIS