KEMBAR78
Evaluating User Interface | PDF | Usability | Ethnography
0% found this document useful (0 votes)
26 views100 pages

Evaluating User Interface

This is use for evaluating interface

Uploaded by

Owoeye Adenike
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views100 pages

Evaluating User Interface

This is use for evaluating interface

Uploaded by

Owoeye Adenike
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

Evaluating User Interfaces

Lecture slides modified from Eileen Kraemer’s HCI teaching material


Department of Computer Science
University of Georgia
Outline
• The Role of Evaluation
• Usage Data: Observations, Monitoring, User’s
Opinions
• Interpretive Evaluation
• Predictive Evaluation
The Role of Evaluation
In the HCI Design model:
• th
the design
d i should
h ld bbe user-centred
t d and
d iinvolve
l users as
much as possible
• the design should integrate knowledge and expertise
from different disciplines
• the design should be highly iterative so that testing can
be done to check that the design does indeed meet user
requirements
The star life cycle

Implementation Task analysis/


functional analysis

Evaluation

Prototyping Requirements spec.

Conceptual design/
formal design
Evaluation
• Evaluation
• tests usability and functionality of system
• occurs in laboratory, field and/or in collaboration with
users
• evaluates both design
g and implementation
p
• should be considered at all stages in the design life
cycle
y
Evaluation
• Concerned with gathering data about the usability
of
• a design or product
• byy a specific
p g
group
p of users
• for a particular activity
• in a specified environment or work context
• Informal feedback …… controlled lab
experiments
Goals of Evaluation
• assess extent of system functionality

• assess effect of interface on user

• identify specific problems


What do you want to know? Why?
• What do users want?
• What problems do they experience?

• Formative -- early and often; closely coupled


with design,
design guides the design process
• Summative -- judgments about the finished
product;
d t near end;
d h
have we d
done well?
ll?
Reasons for doing evaluations
• Understanding the real world
• How employed
p y in workplace?
p
• Better fit with work environment?

• Comparing designs
• compare with competitors or among design options

• Engineering towards a target


• x% of novice users should be able to print correctly on first try

• Checking conformance to a standard


• screen legibility, etc.
When and how do you do
evaluation?
• Early to
• Predict usabilityy of product
p or aspect
p of p
product
• Check design team’s understanding of user requirements
• Test out ideas quickly and informally

• Later to
• identify user difficulties / fine tune
• improve an upgrade of product
Case Study: 1984 Olympic
Messaging System
• Voice mail for 10,000 athletes in LA -> was successful
• Kiosks placed around Olympic village -- 12 languages
• Approach to design (user-centered design)
• printed scenarios of UI prepared, comments obtained from designers,
managementt prospective ti users ->> functions
f ti altered,
lt d dropped
d d
• produced brief user guides, tested on Olympians, families& friends, 200+
iterations before final form decided
• early simulations constructed, tested with users --> need ‘undo’
undo
• toured Olympic villlage sites, early demos, interviews with people
involved in Olympics, ex-Olympian on the design team -> early prototype
-> more iterations and testing
Case Study: 1984 Olympic
Messaging System
• Approach to design (continued)
• “Hallway” method: -- put prototype in hallway, collect opinions on height
and layout from people who walk past
• “Try to destroy it” method -- CS students invited to test robustness by
trying to “crash” it
• Principles of User-Centered Design:
• focus on users & tasks early in design process
• measure reactions using prototype manuals, interfaces, simulations
• design iteratively
• usability factors must evolve together
Case Study: Air Traffic Control
• UK, 1991
• Original
O i i l system
t -- data
d t in
i variety
i t off formats
f t
• analog and digital dials
• CCTV, paper, books
• some line of sight, others on desks or ceiling mountings outside
view
• Goal: integrated display system
system, as much info as practical
on common displays
• Major concern: safety
Air Traffic Control, continued
• Evaluate controller’s task
• want key info sources on one workstation(windspeed, direction, time,
runway use, visual range, meterological data, maps, special procedures)
• Develop first-cut design (London City airport, then Heathrow)
• Establish user
user-systems
systems design group
• Concept testing / user feedback
• modify info requirements
• different layouts for different controllers and tasks
• greater use of color for exceptional situations and different lighting conditions
• ability to make own pages for specific local conditions
• simple editing facilities for rapid updates
ATC, continued
• Produce upgraded prototype
• “Road
Road Show”
Show to five airports
• Develop system specification
• Build and Install system
• Heathrow , 1989
• other airports, 1991
• Establish new needs
Case Study: Forte Travelodge
• System goal: more efficient central room booking
• IBM Usability Evaluation Centre, London
• Evaluation goals:
• identify and eliminate problems before going live
• avoid business difficulties during implementation
• ensure system easy to use by inexperienced staff
• develop improved training material and documentation
The Usability Lab
• Similar to TV studio: microphones, audio, video,
one way mirror
one-way
Particular aspects of interest
• System navigation, speed of use
• screen design:
d i ease off use, clarity,
l it efficiency
ffi i
• effectiveness of onscreen help and error messages
• complexity of keyboard for computer novices
• effectiveness of training program
• clarity and ease-of-use of documentation
Procedure
• Developed set of 15 common scenarios, enacted by
cross-section
cross section of staff
• eight half-day sessions, several scenarios per session
• emphasize that evaluation is of system not staff
• video cameras operated by remote control
• debriefing sessions after each testing period, get info
about problems and feelings about system and document
these
Results:
• Operators and staff had received useful training
• 62 usability
bilit ffailures
il id
identified
tifi d
• Priority given to:
• speed of navigation through system
• problems with titles and screen formats
• operators unable to find key points in doc
• need to redesign telephone headsets
• uncomfortable furniture
• New system:
y higher
g p
productivity,
y, low turnover,, faster
booking, greater customer satisfaction
Evaluation Methods
• Observing and monitoring usage
• field or lab
• observer takes notes / video
• keystroke logging / interaction logging

• Collecting users
users’ opinions
• interviews / surveys

• Experiments
p and benchmarking
g
• semi-scientific approach (can’t control all variables, size of
sample)
Evaluation Methods

• Interpretive Evaluation
• informal, try not to disturb user; user participation common
• includes participatory evaluation, contextual evaluation
• Predictive Evaluation
• predict problems users will encounter without actually testing
the system with the users
• keystroke analysis or expert review based on specification,
mock-up, low-level prototype
• Pilot Study for all types!! -- small study before main study to work
out problems with experiment itself
• Human Subjects concerns --
Usage Data: Observations,
Monitoring User
Monitoring, User’s
s Opinions
• Observing users
• Verbal protocols
• Software logging
• Use
Users’
s opinions:
op o s Interviews
e e sa and
d Ques
Questionnaires
o a es
Direct Observation

• Difficulties:
• people “see what they want to see”
• “Hawthorne effect” -- users aware that performance is monitored,
altering
g behavior and p
performance levels
• single pass / record of observation usually incomplete

• Useful: early, looking for informal feedback, want to know the


kinds of things that users do, what they like, what they don’t
• Know exactly what you’re looking for -> checklist/count
• Want permanent record: video, audio, or interaction logging
Eurochange System
• Machine that exchanges one form of European currency
for another and also dispenses currency for credit/debit
cards -- like an ATM machine
• Intended for installation in airports and railway stations
• Prototype machine installed in Oxford Street
• Your goal: find out how long average transaction takes;
note any problems with user’s experience
• Problems you might experience?
New school multimedia system
• Being tried out by groups of 13 year olds
• Don’t interfere with children’s activities – note the
kinds of things they do and the problems they
encounter
t …
g yyou encounter?
• What difficulties might
Indirect Observation: Video
recording
• Solves some difficulties of direct observation
• C
Can bbe synchronized
h i d with
ith kkeystroke
t k llogging
i or iinteraction
t ti
logging
• Problems:
• effort required to synchronize multiple data sources
• time required to analyze
• users
sers aaware
are the
they’re
’re being filmed
• set up and leave for several days, they get used to it
Analyzing video data
• Task-based analysis
• determine how users tackled tasks,
tasks where major difficulties lie
lie,
what can be done

• Performance
Performance-based
based analysis
• obtain clearly defined performance measures from the data
collected (frequency of task completion, task timing, use of
commands, frequency of errors, time for cognitive tasks)
• classification of errors
• repeatability of study
• time ((5:1)) -- tools can help
p
Verbal protocols
• User’s spoken observations, provides info on:
• what user planned to do
• user’s identification of menu names or icons for controlling the
system
• reactions when thingsg g go wrong,
g, tone of voice,, subjective
j feelings
g
about activity
• “Think aloud protocol” -- user says out loud what he is thinking while
working
ki on a task
t k or problem-solving
bl l i
• Post-Event protocols -- users view videos of their actions and provide
commentary on what they were trying to do
Think Aloud
• user observed performing task
• user asked
k d tto d
describe
ib what
h thhe iis d
doing
i and
d why,
h what
h th
he
thinks is happening etc.

• Advantages
• simplicity - requires little expertise
• can provide
id useful
f l iinsight
i ht
• can show how system is actually use
• Disadvantages
• subjective
• selective
• act of describing may alter task performance
Software Logging
• Researcher need not be present
• partt off data
d t analysis
l i process automated
t t d
• Time-stamped keypresses
• Interaction logging-- recording made in real time and can
be replayed in real time so evaluator can see interaction
as it happened
• Neal & Simons playback system -- researcher adds own
comments to timestamped log
• Remaining problems: expense, volume
Protocol analysis
• paper and pencil – cheap, limited to writing speed
• audio – good for think aloud
aloud, difficult to match with other protocols
• video – accurate and realistic, needs special equipment, obtrusive
• computer logging – automatic and unobtrusive, large amounts of data
difficult to analyze
• user notebooks – coarse and subjective, useful insights, good for
longitudinal studies

• Mixed use in practice.


• audio/video transcription difficult and requires skill.
• Some automatic support tools available
eye tracking
• head or desk mounted equipment tracks the position of
the eye
• eye movement reflects the amount of cognitive
processing a display requires
• measurements include
• fixations: eye maintains stable position. Number and duration
indicate level of difficulty with display
• saccades: rapid eye movement from one point of interest to
another
• scan paths:
th movingi straight
t i ht tto a ttargett with
ith a short
h t fixation
fi ti att the
th
target is optimal
physiological measurements

• emotional response linked to physical changes


• these may help determine a user’s
user s reaction to an
interface
• measurements include:
• heart activity, including blood pressure, volume and pulse.
• activity of sweat glands: Galvanic Skin Response (GSR)
• electrical activity in muscle: electromyogram (EMG)
• electrical
l t i l activity
ti it iin b
brain:
i electroencephalogram
l t h l (EEG)

• some difficulty in interpreting these physiological


responses - more research needed
Interviews and Questionnaires
• Structured interviews
• predetermined questions, asked in a set way
• no exploration of individual attitudes
• structure useful in comparing responses, claiming statistics
• Flexible
Fl ibl iinterviews
t i
• some set topics, no set sequence
• interviewer can follow replies
• l
less fformal,
l ffor requirements
i t gathering
th i
Interviews, continued
• Semistructured interview
• set of questions available for interviewer to draw on if
interviewee digresses or doesn’t say much
• Prompted interview
• draw out more information from interviewee
• based on screen design or prototype
• or “…
“ andd what
h tddo you mean bby …””
Example: semi-structured using
checklist
• Why do you do this? (To get the user’s goal.)
• How do you do it? (To get the subtasks -- ask recursively for each
subtask)
• Why not do it this way instead? (Mention alternative -- in order to get
rationale for choice of method actually used.)
• What are the preconditions for doing this?
• What are the results of doing this?
• May we see your work product?
• Do
D errors ever occur when
h d doing
i thi
this?
?
• How do you discover and correct these errors?
Variations on interviews
• Card sorting
• users asked to group or classify cards to answer
questions, answers recorded on data collection sheet
• Twenty questions
• interviewer asks only yes/no questions
Interviews -- summary
• Focus is on style of presentation and flexibility of data
gathering
• More structured -> easier to analyze
• Less structured ->
> richer information
• Good idea: transcribe interviews to permit detailed
examination (also true for verbal protocols)
Questionnaires and surveys
• Focus is on preparation of unambiguous questions
• Again,
A i pilot
il t study
t d important
i t t
• closed questions:
• respondent selects from set of alternative replies
• usually some form of rating scale
• open
p q questions:
• respondent free to provide own answer
Closed question - simple
checklist

Can you use the following text editing commands?

Yes No Maybe

DUPLICATE [ ] [ ] [ ]
PASTE [ ] [ ] [ ]
Closed question -- six-point
scale
Rate the usefulness of the DUPLICATE command on the
following scale:

very of no
useful |____|____|____|____|____|____| use
Closed question - Likert scale
Computers can simplify complex problems

|____|_____|_____|_____|_____|_____|_____|
strongly agree slightly neutral slightly disagree strongly
agree agree disagree disagree
Closed question - semantic
differential
Rate the Beauxarts drawing package on the
f ll i di
following dimensions:
i

_____| extremely | quite | slightly | neutral | slightly | quite | extremely|_____


easy | | | | | | | | difficult
clear | | | | | | | | confusing
fun | | | | | | | | boring
Closed question - ranked order
Place the following commands in order of usefulness (use a
scale of 1 to 4 where 1 is the most useful)

___ PASTE
___ DUPLICATE
___ GROUP
___ CLEAR
Questionnaires
• Responses converted to numerical values
• St
Statistical
ti ti l analysis
l i performed
f d (mean,
( std_dev,
td d SPSS
often used if more statistical detail required)
• Increase chances of respondents completing and
returning:
• short
• small fee or token
• send copy of report
• stamped, self-addressed envelope
• Pre- / post- questionnaires
Questionnaire on User Interaction Satisfaction
(QUIS)

OVERALL REACTIONS TO THE SOFTWARE


terrible wonderful
0123456789
ddifficult
cu t easy
0123456789
frustrating satisfying
0123456789
SCREEN
· Characters on the computer screen
hard to read easy to read
0123456789
· Highlighting on the screen simplifies task
not at all very much
0123456789
· Organization
g of information on screen
confusing very clear
0123456789
Example: Eurochange questionnaire
• Eurochange.pdf
• Identify strengths and weaknesses.
• How could this be improved?
Questionnaires
• Need careful design
• what information is required?
• how are answers to be analyzed?

• Styles of question
• general
• open ended
open-ended
• scalar
• multi-choice
• ranked
How to write a good survey
• Write a short questionnaire
• what is essential to know? what would be useful to know? what
would be unnecessary?
• Use simple words
• Don’t: "What is the frequency of your automotive travel to your
parents' residence in the last 30 days?"
• Do: "About how many times have you driven to your parent's
h
home iin th
the llastt 30 d
days?"
?"
How to write a good survey
• Relax your grammar
• if the q
questions sound too formal.
• For example, the word "who" is appropriate in many instances
when "whom" is technically correct.
• Assure a common understanding
• Write questions that everyone will understand in the same way.
Don't assume that everyone has the same understanding of the
facts or a common basis of knowledge. Identify even commonly
usedd abbreviations
bb i ti tto b
be certain
t i th
thatt everyone understands.
d t d
How to write a good survey
• Start with interesting questions
• Start the survey with questions that are likely to sound interesting and
attract the respondents
respondents' attention.
attention
• Save the questions that might be difficult or threatening for later.
• Voicing questions in the third person can be less threatening than
questions voiced in the second question.
• Don't write leading questions
• Leading questions demand a specific response. For example: the
question "Which day of the month is best for the newly established
company wide monthly meeting?"
company-wide meeting? leads respondents to pick a date
without first determining if they even want another meeting.
How to write a good survey
• Avoid double negatives
• Respondents can easily be confused deciphering the
meaning of a question that uses two negative words.
• Balance rating scales
• When the question requires respondents to use a
rating
g scale, mediate the scale so that there is room for
both extremes.
How to write a good survey
• Don't make the list of choices too long
• If the list of answer categories is long and unfamiliar,
unfamiliar it
is difficult for respondents to evaluate all of them. Keep
the list of choices short.
• Avoid difficult concepts
• Some q
questions involve concepts
p that are difficult for
many people to understand.
How to write a good survey
• Avoid difficult recall questions
• People's memories are increasingly unreliable as you ask them to recall
events farther and farther back in time.
time You will get more accurate
information from people if you ask about the recent past (past month)
versus the more distant past (last year).

• Use Closed-ended questions rather than Open-ended ones


• Closed-ended are useful because the respondents know clearly the
purpose
pu pose oof tthe
e quest
question
o aand
daare
e limited
ted to a set o
of cchoices
o ces where
eeo one
e
answer is right for them. Easier to analyze.
• An open-ended question is a written response. For example: "If you do
not want a company picnic, please explain why". .. Can provide new
ideas/info.
How to write a good survey
• Put your questions in a logic order
• The issues raised in one question can influence how people think
about subsequent questions.
• It is good to ask a general question and then ask more specific
q
questions..
• Pre-test your survey
• First test to a small number of people.
• Th brainstorm
Then b i t with
ith them
th tto see if th
they h
had
d problems
bl answering
i
any questions. Have them explain what the question meant to
them.
How to write a good survey
• Name your survey
• If you send it out by email, it may be mistaken for “spam”. Also want to
pique the interest of the recipients
recipients.
• Here are examples of survey names that might be successful in getting
attention:
• Memo From the Chief Executive Officer
• Evaluation of Services of the Benefits Office
• Your Opinion About Financial Services
• Free T-shirt Win a Trip to Paris
• Please Respond By Friday
• Free Subscription
• Win a notebook computer
• .. But some of these look like spam to me .. Proceed with caution.
How to write a good survey
• Cover memo or introduction
• If sending
g by
y US mail or email, may
y still need to motivate recipient
p
to complete it.
• A good cover memo or introduction should be short and
includes:
• Purpose of the survey
• Why it is important to hear from the respondent
• What may be done with the results and what possible impacts may occur
with the results.
• Address identification
• Person to contact for questions about the survey.
• Due date for response
Interpretive Evaluation
• Contextual inquiry
• Cooperative and participative evaluation
• Ethnography

• rather
th ththan emphasizing
h i i statement
t t t off goals,
l objective
bj ti
tests, research reports, instead emphasizes usefulness of
findings to the people concerned
• good for feasibility study, design feedback, post-
p
implementation review
Interpretive Evaluation
• Experimental: Formal and objective
• Interpretive: More subjective
• Concerned with humans, so no objective reality
• Sociological anthropological approach
Sociological,

• Users involved, as opposed to predictive


approaches
Beliefs
• Sees limitations in scientific hypothesis testing in
closed environment
• Lab is not real world
• Can’t control all variables
• Context is neglected
• Artificial, short tasks
Contextual Inquiry
• Users and researchers participate to identify and
understand usability problems within the normal
working environment of the user.
• Makes
M k use off the
th contextual
t t l interview.
i t i
• Recommendations to evaluator:
• Get as close to work as possible
• Uncover work practice hidden in words
• Create interpretations with customers
• Let customers expand the scope of the discussion
Contextual Inquiry
• Users and researchers participate to identify and
understand usability problems within the normal working
environment of the user
• Differences from other methods include:
• work context -- larger tasks
• time context -- longer times
• motivational context -- more user control
• social context -- social support included that is normally lacking in
experiments
Why use contextual inquiry?
• Usability issues located that go undetected in
laboratory testing
testing.
• Line counting in word processing
• unpacking
p g and setting
g upp equipment
q p
• Issues identified by users or by user/evaluator
Contextual interview: topics of
interest
• Structure and language used in work
• individual and group actions and intentions
• culture affecting the work
• explicit and implicit aspects of the work
Cooperative evaluation
• A technique to improve a user interface
specification by detecting the possible usability
problems in an early prototype or partial
simulation
• low cost, little training needed
• think aloud protocols collected during evaluation
Cooperative Evaluation
• Typical user(s) recruited
• representative tasks selected
• user verbalizes problems/ evaluator makes notes
• debriefing sessions held
• Summarize and report back to design team
Participative Evaluation
• More open than cooperative evaluation
• subject to greater control by users
• cooperative prototyping, facilitated by
• focus groups
• designers work with users to prepare prototypes
• stable
t bl prototypes
t t provided,
id d users evaluate
l t
• tight feedback loop with designers
Ethnography
• Standard practice in anthropology
• R
Researchers
h strive
t i tto iimmerse th
themselves
l iin th
the situation
it ti
they want to learn about
• Goal: understand the ‘real’ work situation
• typically applies video - videos viewed, reviewed, logged,
analyzed collections made
analyzed, made, often placed in databases
databases,
retrieved, visualized ….
Ethnography
• “a holistic interpretation of a group’s culture”
• Blomberg et al. (1993) highlight four main principles that
guide much ethnographic work:
1
1. Ethnography is grounded in fieldwork - people are studied in
their natural settings.
2. To understand the influence of context on people’s activities one
must take a holistic perspective.
3. Ethnographers build up a descriptive account of how people
behave, not how they ought to behave.
4. Importance
p is g
given to understanding
g things
g from the p
point-of-
view of those studied.
Types of Findings
• Can be both
• Qualitative
• Observe trends, habits, patterns, …

• Q
Quantitative
tit ti
• How often was something done, what per cent of the time did
something occur, how many different …
Predictive Evaluation
• Predict aspects of usage rather than observe and
measure
• doesn’t involve users
• cheaper
Why Predictive Evaluation
• User testing is expensive and time consuming, and
requires a prototype
• Predictive techniques use expertise of human-computer
interaction specialists (in person or via heuristics or
models they develop) to identify usability problems
without testing or (in some cases) prototypes
Predictive Evaluation Methods
• Inspection Methods
• Standards inspections
• Consistency inspection
• Heuristic evaluation
• Walkthroughs

• Modeling: The keystroke level model


Standards inspections
• Standards experts inspect the interface for
compliance with specified standards
• e.g., visibility of screen objects
• relatively
relati el little task kno
knowledge
ledge req
required
ired
Consistency inspections
• Teams of designers inspect a set of interfaces for
a family of products
• usually one designer from each project
Usage simulations
• Aka - “expert review”, “expert simulation”

• Experts simulate behavior of less-experienced


less experienced
users, try to anticipate usability problems
• more efficient than user trials
• prescriptive feedback
Usage Simulation (Expert
Review)
• Pretend you are a novice user; identify usability problems
• Requires
R i
• Expertise in HCI
• Expertise in the application area
• Ability to role play the novice
• Objectivity (not a developer)
• Problems
• Bias of experts: use more than one
• Hard to find experts
• Real novices do the most unexpected things!
Heuristic evaluation
• Proposed by Nielsen and Molich.

• usability criteria (heuristics) are identified


• design examined by experts to see if these are violated

• Example heuristics
• system behaviour is predictable
• system behaviour is consistent
• feedback is provided

• Heuristic evaluation `debugs' design.


Sample heuristics
• Use simple and natural dialogue
• speakk the
th user’s
’ llanguage
• minimize user memory load
• be consistent
• provide feedback
• provide clearly marked exits
• provide shortcuts
• provide good error messages
• prevent errors
Walkthroughs
• Goal - detect problems early on; remove
• construct
t t carefully
f ll designed
d i d ttasks
k ffrom a system
t
specification or screen mockup
• walk
walk-through
through the activities required
required, predict how users
would likely behave, determine problems they will
encounter
Walkthroughs
• Structured form of usage simulation
• Identify task,
task context,
context and user population
• Walk through task, predicting user behavior
• Variations:
• Cognitive walkthrough: simulate cognitive processing
of user
• Pluralistic walkthrough: multiple types of experts
Cognitive Walkthrough
Proposed by Polson et al.
• evaluates design on how well it supports user in learning
task
• usually performed by expert in cognitive psychology
• expert ‘walks though’ design to identify potential
problems using psychological principles
• forms used to guide analysis
Cognitive Walkthrough (ctd)
• For each task walkthrough considers
• what impact will interaction have on user?
• what cognitive processes are required?
• what learning problems may occur?

• Analysis focuses on goals and knowledge: does


the design lead the user to generate the correct
goals?
Modeling: keystroke level model
• Goal: calculate task performance times for
experienced users

• Requires
• specification of system functionality
• task analysis, breakdown of each task into its
components
Keystroke-level modeling
• Time to execute sum of:
• Tk - keystroking (0.35
(0 35 sec)
• Tp - pointing (1.10)
• Td - drawing (problem-dependent)
• Tm - mental (1.35)
• Th - homing (0.4)
• Tr - system response (1
(1.2)
2)
Keystroke Modeling Example

Save a file in application using mouse and pull down


menu
1. Initial homing to mouse T_H = 0.4
2 Move cursor to file menu T
2. T_PP+T T_MM=1 1.35
35 + 11.10
10 = 2
2.33
33
3. Select “save as” in file menu (click, move, click): T_M + T_K + T_P +
T_K = 0.35 + 1.35 + 1.10 + 0.35 = 7.05
4. Application
pp p
prompts
p for file name T_R = 1.2;; user types
yp 8 characters:
T_R + T_M + T_K*8 + T_K for return = 1.2 + 1.35 + 0.35*8 + 1.35 +
0.35 = 7.05
Total = 13.05
Choosing an Evaluation Method

when in process: design vs. implementation


style of evaluation: laboratory vs. field
how objective:
j subjective
j vs. objective
j
type of measures: qualitative vs. quantitative
level of information: high level vs
vs. low level
level of interference: obtrusive vs. unobtrusive
resources available: time, subjects,
equipment, expertise
Example: Star Workstation,
text selection
• Goal: evaluate methods for selecting text, using
1 3 mouse buttons
1-3
• Operations:
• Point
P i t (b
(between
t characters,
h t ttargett off move,copy, or
insert)
• Select
Se ect te
textt (c
(character,
a acte , word,
o d, se
sentence,
te ce, papar,, doc)
• Extend selection to include more text
Selection Schemes

A B C D E F G

Button1 Point Point Point Point Point Point Point


C C, W, S, C, W, S, C C, W, S,
Drwthru P, D P, D, Dthru P, D
Drwthru Drwthru

Button2 C C, W, S, W, S, P, Adjust Adjust Adjust


Drwthru P, D D
Drwthru Drwthru

Button3 W, S, P,
D
Drwthru
Methodology
• Between-subjects paradigm
• six
i groups, 4 subjects
bj t per group
• in each group: 2 experienced w/mouse, 2 not
• each subject first trained in use of mouse and in editing
techniques in Star w.p. system
• Assigned scheme taught
• Each subject performs 10 text-editing tasks, 6 times each
Results: selection time
Time:

Scheme A :12.25 s
Scheme B: 15.19 s
Scheme C: 13
13.41
41 s
Scheme D: 13.44 s
Scheme E: 12
12.85
85 s
Scheme F: 9.89 s
Results: Selection Errors
• Average: 1 selection error per four tasks
• 65% of errors were drawthrough errors, same
across all selection schemes
• 20% of errors were “too many clicks” , schemes
with less clicking
g better
• 15% of errors were ‘click wrong mouse button”,
schemes with fewer buttons better
Selection scheme: test 2
• Results of test 1 lead to conclusion to avoid:
• drawthroughs
• three buttons
• multiple clicking
• S
Scheme
h “G” introduced
i t d d -- avoids
id ddrawthrough,
th h uses only
l
2 buttons
• New test
test, but test groups were 3:1 experienced w/mouse
to not
Results of test 2
• Mean selection time: 7.96s for scheme G,
frequency of “too
too many clicks
clicks” stayed about the
same
• Conclusion:
C l i scheme
h G acceptable
t bl
• selection time shorter
• advantage of quick selection balances moderate error
rate of multi-clicking
Experimental design - concerns
• What to change? What to keep constant? What
to measure?
• Hypothesis, stated in a way that can be tested.
• Statistical tests: which ones, why?
Selecting subjects - avoiding
bias
• Age bias -- Cover target age range
• Gender bias -- equal numbers of male/female
• Experience bias -- similar level of experience with
computers
• etc.
etc ...
Experimental Designs
• Independent subject design
• single group of subjects allocated randomly to each of the
experimental conditions

• Matched subject design


• subjects matched in pairs, pairs allocated randomly to each of the
experimental conditions

• Repeated measures design


• all subjects appear in all experimental conditions
• Concerns: order of tasks,, learning
g effects

• Single subject design


• in-depth
p experiments
p on jjust one subject
j
Critical review of experimental
procedure
• User preparation
• adequate instructions and training?
• Impact of variables
• how do changes in independent variables affect users
• Structure of the tasks
• were tasks complex enough, did users know aim?
• Time taken
• fatigue or boredom?
Critical review of experimental
results
• Size of effect
• statistically significant? Practically significant?
• Alternative interpretations
• other possible causes for results found?
• Consistency between dependent variables
• task completion and error scores versus user preferences and
learning scores
• Generalization of results
• to other tasks,users,
, , working
g environments?

You might also like