KEMBAR78
Data & Analytics Club - Data Visualization Workshop | PPTX
Data Visualization Nikhil Srivastava, 2015
Nikhil Srivastava
Wharton Data & Analytics Club
Data Visualization Nikhil Srivastava, 2015
hoster@wharton.upenn.edu
Data Visualization Nikhil Srivastava, 2015
About this Lecture
• Shortened version of longer course
– Slides, demos, extra material
– Code samples and libraries
– Sample projects
• Questions
Data Visualization Nikhil Srivastava, 2015
About You
Data Visualization Nikhil Srivastava, 2015
• What is Data Visualization?
• Thinking and Seeing
• From Data to Graphics
• Principles and Guidelines
• Building Visualizations
• Advanced
introduction
foundation & theory
building blocks
design & critique
construction
Outline
Data Visualization Nikhil Srivastava, 2015
• What is Data Visualization?
• Thinking and Seeing
• From Data to Graphics
• Principles and Guidelines
• Building Visualizations
• Advanced
introduction
foundation & theory
building blocks
design & critique
construction
Data Visualization Nikhil Srivastava, 2015
Data Visualization
Information Visualization
Scientific Visualization
Infographics
Statistical Graphics
Informative Art
Art
Science
Statistics
JournalismDesign
Visual Analytics
Business
Data Visualization Nikhil Srivastava, 2015
City State Population
Baton Rouge Louisiana 191,741
Birmingham Alabama 220,927
Broken Arrow Oklahoma 58,018
Eugene Oregon 115,890
Glendale Arizona 245,868
Huntsville Alabama 55,741
Lafayette Louisiana 87,737
Mobile Alabama 98,147
Montgomery Alabama 126,250
New Orleans Louisiana 322,172
Norman Oklahoma 101,590
Peoria Arizona 167,868
Portland Oregon 514,108
Salem Oregon 147,631
Scottsdale Arizona 134,335
Shreveport Louisiana 68,756
Surprise Arizona 90,548
Tempe Arizona 143,369
Tulsa Oklahoma 392,138
Data Visualization Nikhil Srivastava, 2015
• Which is the most populous
city in the list?
• Which state in the list has
the most cities?
• Which state in the list has
the largest average city?
City State Population
Baton Rouge Louisiana 191,741
Birmingham Alabama 220,927
Broken Arrow Oklahoma 58,018
Eugene Oregon 115,890
Glendale Arizona 245,868
Huntsville Alabama 55,741
Lafayette Louisiana 87,737
Mobile Alabama 98,147
Montgomery Alabama 126,250
New Orleans Louisiana 322,172
Norman Oklahoma 101,590
Peoria Arizona 167,868
Portland Oregon 514,108
Salem Oregon 147,631
Scottsdale Arizona 134,335
Shreveport Louisiana 68,756
Surprise Arizona 90,548
Tempe Arizona 143,369
Tulsa Oklahoma 392,138
Data Visualization Nikhil Srivastava, 2015
Data Visualization Nikhil Srivastava, 2015
• Which is the most populous
city in the list?
• Which state in the list has
the most cities?
• Which state in the list has
the largest average city?
Data Visualization Nikhil Srivastava, 2015
• Which is the most populous
city in the list?
• Which state in the list has
the most cities?
• Which state in the list has
the largest average city?
• What is the population of
Montgomery, Alabama?
Data Visualization Nikhil Srivastava, 2015
Data Visualization is:
• Useful
– Answers user questions
– Reduces user workload
(by design, not by default)
Data Visualization Nikhil Srivastava, 2015
Anscombe’s quartet (1973)
Data Visualization Nikhil Srivastava, 2015
Anscombe’s quartet (1973)
Data Visualization Nikhil Srivastava, 2015
Data Visualization is:
• Useful
– Understand structure and patterns
– Resolve ambiguity
– Locate outliers
Data Visualization Nikhil Srivastava, 2015
Data Visualization Nikhil Srivastava, 2015
Data Visualization is:
• Important
– Design decisions affect interpretation
Data Visualization Nikhil Srivastava, 2015
Crimean War Deaths
Florence Nightingale, 1858 (re-colorized)
Data Visualization Nikhil Srivastava, 2015
Gapminder Foundation
Data Visualization Nikhil Srivastava, 2015
Data Visualization is:
• Powerful
– Communicate, teach, inspire
Data Visualization Nikhil Srivastava, 2015
purpose communicate explore, analyze
data type numerical,
categorical
text, maps,
graphs, networks
method static
representation
animation,
interactivity
Our Focus
Data Visualization Nikhil Srivastava, 2015
• What is Data Visualization?
• Thinking and Seeing
• From Data to Graphics
• Principles and Guidelines
• Building Visualizations
• Advanced
introduction
foundation & theory
building blocks
design & critique
construction
Data Visualization Nikhil Srivastava, 2015
The Hardware
Data Visualization Nikhil Srivastava, 2015
The Software
• High-level concepts: objects,
symbols
• Involves working memory
• Slower, serial, conscious
• Sensory input
• Low-level features: orientation,
shape, color, movement
• Rapid, parallel, automatic
Visual
Perception
“Bottom-up”
Data Visualization Nikhil Srivastava, 2015
The Software
• High-level concepts: objects,
symbols
• Involves working memory
• Slow, sequential, conscious
• Sensory input
• Low-level features: orientation,
shape, color, movement
• Rapid, parallel, automatic
“Bottom-up”
“Top-down”
Visual
Perception
Data Visualization Nikhil Srivastava, 2015
Task: Counting
How many 3’s?
1281768756138976546984506985604982826762
9809858458224509856458945098450980943585
9091030209905959595772564675050678904567
8845789809821677654876364908560912949686
Data Visualization Nikhil Srivastava, 2015
Task: Counting
How many 3’s?
1281768756138976546984506985604982826762
9809858458224509856358945098450980943585
9091030209905959595772564675050678904567
8845789809821677654876364908560912949686
1281768756138976546984506985604982826762
9809858458224509856358945098450980943585
9091030209905959595772564675050678904567
8845789809821677654876364908560912949686
Data Visualization Nikhil Srivastava, 2015
Task: Counting
Slow, sequential, conscious
Rapid, parallel, automatic
1281768756138976546984506985604982826762
9809858458224509856358945098450980943585
9091030209905959595772564675050678904567
8845789809821677654876364908560912949686
1281768756138976546984506985604982826762
9809858458224509856358945098450980943585
9091030209905959595772564675050678904567
8845789809821677654876364908560912949686
Data Visualization Nikhil Srivastava, 2015
Task: (Distracted) Search
Which side has the red circle?
Data Visualization Nikhil Srivastava, 2015
Task: (Distracted) Search
Which side has the red circle?
Data Visualization Nikhil Srivastava, 2015
Task: (Distracted) Search
Which side has the red circle?
Data Visualization Nikhil Srivastava, 2015
Task: (Distracted) Search
Which side has the red circle?
Data Visualization Nikhil Srivastava, 2015
Task: (Distracted) Search
Slow, sequential, conscious
Rapid, parallel, automatic
Data Visualization Nikhil Srivastava, 2015
Task: (Distracted) Search
Data Visualization Nikhil Srivastava, 2015
Task: (Distracted) Search
Data Visualization Nikhil Srivastava, 2015
Task: (Distracted) Search
Data Visualization Nikhil Srivastava, 2015
Task: (Distracted) Search
Slow, sequential, conscious
Rapid, parallel, automatic
(n=7)
(n=5)
(n=3)
Data Visualization Nikhil Srivastava, 2015
Lessons for Visualization
• Use “pre-attentive” attributes when possible
– Color, shape, orientation (depth, motion)
– Faster, higher bandwidth
• Caveats
– Beware limits of working memory (<7)
– Be careful mixing attributes
Data Visualization Nikhil Srivastava, 2015
Example: Inefficient Attributes
Data Visualization Nikhil Srivastava, 2015
Example: Too Many Attributes
Data Visualization Nikhil Srivastava, 2015
• What is Data Visualization?
• Thinking and Seeing
• From Data to Graphics
• Principles and Guidelines
• Building Visualizations
• Advanced
introduction
foundation & theory
building blocks
design & critique
construction
Data Visualization Nikhil Srivastava, 2015
What kind of
data do we
have?
How can we
represent the data
visually?
How can we
organize this into a
visualization?
Visual
Encoding
Data Visualization Nikhil Srivastava, 2015
Data Types
CATEGORICAL ORDINAL NUMERICAL
Interval Ratio
Male / Female
Asia / Africa / Europe
True / False
Small / Med / Large
Low / High
Yes / Maybe / No
Latitude/Longitude
Compass direction
Time (event)
Length
Count
Time (duration)
= = = =
< > < > < >
- + -
* /
Data Visualization Nikhil Srivastava, 2015
Data Types
CATEGORICAL ORDINAL NUMERICAL
Interval Ratio
Male / Female
Asia / Africa / Europe
True / False
Small / Med / Large
Low / High
Yes / Maybe / No
Latitude/Longitude
Compass direction
Time (event)
Length
Count
Time (duration)
Bin/Categorize
Difference/Normalize
Data Visualization Nikhil Srivastava, 2015
Data Types (Advanced)
• Networks/Graphs
– Hierarchies/Trees
• Text
• Maps: points, regions, routes
Data Visualization Nikhil Srivastava, 2015
What kind of
data do we
have?
How can we
represent the data
visually?
How can we
organize this into a
visualization?
Visual
Encoding
Data Visualization Nikhil Srivastava, 2015
Visual Encodings
Marks
point
line
area
volume
Channels
position
size
shape
color
angle/tilt
Data Visualization Nikhil Srivastava, 2015
Channel Effectiveness
Data Visualization Nikhil Srivastava, 2015
Channel Effectiveness
“Spatial position is such a good visual
coding of data that the first decision of
visualization design is which variables get
spatial encoding at the expense of others”
Data Visualization Nikhil Srivastava, 2015
What kind
of data do
we have?
How can we
represent the
data visually?
How can we
organize this into
a visualization?
Athi River Machakos 139,380
Awasi Kisumu 93,369
Kangundo-Tala Machakos 218,557
Karuri Kiambu 129,934
Kiambu Kiambu 88,869
Kikuyu Kiambu 233,231
Kisumu Kisumu 409,928
Kitale Trans-Nzoia 106,187
Kitui Kitui 155,896
Limuru Kiambu 104,282
Machakos Machakos 150,041
Molo Nakuru 107,806
Mwingi Kitui 83,803
Naivasha Nakuru 181,966
Nakuru Nakuru 307,990
Nandi Hills Trans-Nzoia 73,626
Data Visualization Nikhil Srivastava, 2015
type mark channel data represented
Scatter Plot point position 2 quantitative
Data Visualization Nikhil Srivastava, 2015
type mark channel data represented
Scatter + Hue point position,
color
2 quantitative,
1 categorical
Data Visualization Nikhil Srivastava, 2015
type mark channel data represented
Scatter + Size
(“Bubble”)
point position,
size
3 quantitative
Data Visualization Nikhil Srivastava, 2015
Scatter Plot – Applications
RELATIONSHIP GROUPING OUTLIERS
Data Visualization Nikhil Srivastava, 2015
Scatter Plot – Dangers
OCCLUSION
(DENSITY)
OCCLUSION
(OVERLAP)
3-D
Data Visualization Nikhil Srivastava, 2015
type mark channel data represented
Line Chart line position
(orientation)
2 quantitative
Data Visualization Nikhil Srivastava, 2015
type mark channel data represented
Area Chart area size (length) 2 quantitative
Data Visualization Nikhil Srivastava, 2015
Line Chart – Applications
PATTERN OVER TIME COMPARISON
Data Visualization Nikhil Srivastava, 2015
Line Chart – Dangers
Y SCALING
X SCALING
OVERLOAD
Data Visualization Nikhil Srivastava, 2015
type mark channel data represented
Bar Chart line size (length) 1 categorical,
1 quantitative
Data Visualization Nikhil Srivastava, 2015
type mark channel data represented
Histogram line size (length) 1 ordinal/quantitative,
1 quantitative (count)
Data Visualization Nikhil Srivastava, 2015
Bar Chart – Applications
COMPARE CATEGORIES DISTRIBUTION
Data Visualization Nikhil Srivastava, 2015
Bar Chart – Dangers
TOO MANY CATEGORIES
POORLY SORTED CATEGORIES
ZERO AXIS
Data Visualization Nikhil Srivastava, 2015
type mark channel data represented
Pie Chart area size (angle) 1 quantitative
Data Visualization Nikhil Srivastava, 2015
Pie Chart – Dangers
AREA/ANGLE SCALE SIMILAR AREAS OVERLOAD
Data Visualization Nikhil Srivastava, 2015
Multi-Series: Bar
“GROUPED”
BAR CHART
“STACKED”
BAR CHART
Data Visualization Nikhil Srivastava, 2015
Multi-Series: Line
MULTIPLE
LINE
STACKED
AREA CHART
Data Visualization Nikhil Srivastava, 2015
Normalization
NORMALIZED BAR NORMALIZED AREA
Data Visualization Nikhil Srivastava, 2015
• What is Data Visualization?
• Thinking and Seeing
• From Data to Graphics
• Principles and Guidelines
• Building Visualizations
• Advanced
introduction
foundation & theory
building blocks
design & critique
construction
Data Visualization Nikhil Srivastava, 2015
From Science to Art
• Design principles*
• Style guidelines*
*dependent on context and objective (and author)
Data Visualization Nikhil Srivastava, 2015
Design Principles
Data Visualization Nikhil Srivastava, 2015
Design Principles
• Integrity
– Tell the truth with data
• Effectiveness
– Achieve visualization objectives
• Aesthetics
– Be compelling, vivid, beautiful
Data Visualization Nikhil Srivastava, 2015
Integrity
Lie Ratio =
size of effect in graphic
size of effect in data
Data Visualization Nikhil Srivastava, 2015
Integrity
Data Visualization Nikhil Srivastava, 2015
Integrity
“show data variation, not design variation”
Data Visualization Nikhil Srivastava, 2015
Effectiveness*
Data/Ink Ratio =
ink representing data
total ink
*Tufte
Data Visualization Nikhil Srivastava, 2015
Effectiveness* *Tufte
avoid “chart junk”
Data Visualization Nikhil Srivastava, 2015
Avoid Chart Junk
Data Visualization Nikhil Srivastava, 2015
Avoid Chart Junk
Data Visualization Nikhil Srivastava, 2015
Avoid Chart Junk
Data Visualization Nikhil Srivastava, 2015
Avoid Chart Junk
Data Visualization Nikhil Srivastava, 2015
Avoid Chart Junk
Data Visualization Nikhil Srivastava, 2015
Avoid Chart Junk
Data Visualization Nikhil Srivastava, 2015
Effectiveness (Few)
Data Visualization Nikhil Srivastava, 2015
Practical Guidelines
• Avoid 3-D charts
• Focus on substance over graphics
• Avoid separate legends and keys
• Use faint grids/guidelines
• Avoid unnecessary textures and colors
Data Visualization Nikhil Srivastava, 2015
A Note on Color
• To label
• To emphasize
• To liven or decorate
Data Visualization Nikhil Srivastava, 2015
Color as a Channel
Categorical Quantitative
Hue Good
(6-8 max)
Poor
Value Poor Good
Saturation Poor Okay
Data Visualization Nikhil Srivastava, 2015
Bad Color
Data Visualization Nikhil Srivastava, 2015
Good Color
Data Visualization Nikhil Srivastava, 2015
More Color Guidelines
• Use color only when necessary
• Saturated colors for small areas, labels
• Less saturated colors for large areas,
backgrounds
• Use tools like ColorBrewer
Data Visualization Nikhil Srivastava, 2015
• What is Data Visualization?
• Thinking and Seeing
• From Data to Graphics
• Principles and Guidelines
• Building Visualizations
• Advanced
introduction
foundation & theory
building blocks
design & critique
construction
Data Visualization Nikhil Srivastava, 2015
What Tools to Use?
Athi River Machakos 139,380
Awasi Kisumu 93,369
Kangundo-Tala Machakos 218,557
Karuri Kiambu 129,934
Kiambu Kiambu 88,869
Kikuyu Kiambu 233,231
Kisumu Kisumu 409,928
Kitale Trans-Nzoia 106,187
Kitui Kitui 155,896
Limuru Kiambu 104,282
Machakos Machakos 150,041
Molo Nakuru 107,806
Mwingi Kitui 83,803
Naivasha Nakuru 181,966
Nakuru Nakuru 307,990
Nandi Hills Trans-Nzoia 73,626
Clean
Restructure
Explore
Analyze
DATA
Visualization Goals
Data Visualization Nikhil Srivastava, 2015
Visualization Tools
Excel
Tableau
Plotly
Python
R
Matlab
Ubiq/Silk
How hard is it to learn?
How
powerful
& flexible
is it?
I’ll have to write code
Data Visualization Nikhil Srivastava, 2015
Visualization Tools
Excel
Tableau
Plotly
Python
R
Matlab
Ubiq/Silk
How hard is it to learn?
How
powerful
& flexible
is it?
Google Charts
Highcharts
d3
I’ll have to write code
Data Visualization Nikhil Srivastava, 2015
Cheat Sheets
• For Hackathon participants
• Otherwise, email me
Data Visualization Nikhil Srivastava, 2015
• What is Data Visualization?
• Thinking and Seeing
• From Data to Graphics
• Principles and Guidelines
• Building Visualizations
• Advanced
introduction
foundation & theory
building blocks
design & critique
construction
Data Visualization Nikhil Srivastava, 2015
Small Multiples
Data Visualization Nikhil Srivastava, 2015
Treemap
(Hierarchical Data)
Strengths:
nested relationships
Concerns:
order, aspect ratio
Data Visualization Nikhil Srivastava, 2015
Multi-Level Pie Chart
(Hierarchical Data)
Strengths:
nested relationships
Concerns:
readability
Data Visualization Nikhil Srivastava, 2015
Heat Map
(Table/Field Data)
Strengths: pattern/outlier detection
Concerns: ordering, clustering, color
Data Visualization Nikhil Srivastava, 2015
Choropleth
(Region Data)
Strengths:
geography
Concerns:
region size
color
Data Visualization Nikhil Srivastava, 2015
Cartogram
(Region Data)
Strengths: geographic pattern
Concerns: base map knowledge
Data Visualization Nikhil Srivastava, 2015
The Ebb and Flow of Movies
NY Times, 2008
Streamgraph
Data Visualization Nikhil Srivastava, 2015
“Data Visualization” Wikipedia Page
Wordle
Word Cloud
Data Visualization Nikhil Srivastava, 2015
Data Visualization Nikhil Srivastava, 2015
Twitter Networks
PJ Lamberson, 2012
Data Visualization Nikhil Srivastava, 2015
Blogs/Reference
• Infosthetics.com
• Visualizing.org
• FlowingData.com
Data Visualization Nikhil Srivastava, 2015
Nikhil Srivastava
nsri@wharton.upenn.edu

Data & Analytics Club - Data Visualization Workshop

  • 1.
    Data Visualization NikhilSrivastava, 2015 Nikhil Srivastava Wharton Data & Analytics Club
  • 2.
    Data Visualization NikhilSrivastava, 2015 hoster@wharton.upenn.edu
  • 3.
    Data Visualization NikhilSrivastava, 2015 About this Lecture • Shortened version of longer course – Slides, demos, extra material – Code samples and libraries – Sample projects • Questions
  • 4.
    Data Visualization NikhilSrivastava, 2015 About You
  • 5.
    Data Visualization NikhilSrivastava, 2015 • What is Data Visualization? • Thinking and Seeing • From Data to Graphics • Principles and Guidelines • Building Visualizations • Advanced introduction foundation & theory building blocks design & critique construction Outline
  • 6.
    Data Visualization NikhilSrivastava, 2015 • What is Data Visualization? • Thinking and Seeing • From Data to Graphics • Principles and Guidelines • Building Visualizations • Advanced introduction foundation & theory building blocks design & critique construction
  • 7.
    Data Visualization NikhilSrivastava, 2015 Data Visualization Information Visualization Scientific Visualization Infographics Statistical Graphics Informative Art Art Science Statistics JournalismDesign Visual Analytics Business
  • 8.
    Data Visualization NikhilSrivastava, 2015 City State Population Baton Rouge Louisiana 191,741 Birmingham Alabama 220,927 Broken Arrow Oklahoma 58,018 Eugene Oregon 115,890 Glendale Arizona 245,868 Huntsville Alabama 55,741 Lafayette Louisiana 87,737 Mobile Alabama 98,147 Montgomery Alabama 126,250 New Orleans Louisiana 322,172 Norman Oklahoma 101,590 Peoria Arizona 167,868 Portland Oregon 514,108 Salem Oregon 147,631 Scottsdale Arizona 134,335 Shreveport Louisiana 68,756 Surprise Arizona 90,548 Tempe Arizona 143,369 Tulsa Oklahoma 392,138
  • 9.
    Data Visualization NikhilSrivastava, 2015 • Which is the most populous city in the list? • Which state in the list has the most cities? • Which state in the list has the largest average city? City State Population Baton Rouge Louisiana 191,741 Birmingham Alabama 220,927 Broken Arrow Oklahoma 58,018 Eugene Oregon 115,890 Glendale Arizona 245,868 Huntsville Alabama 55,741 Lafayette Louisiana 87,737 Mobile Alabama 98,147 Montgomery Alabama 126,250 New Orleans Louisiana 322,172 Norman Oklahoma 101,590 Peoria Arizona 167,868 Portland Oregon 514,108 Salem Oregon 147,631 Scottsdale Arizona 134,335 Shreveport Louisiana 68,756 Surprise Arizona 90,548 Tempe Arizona 143,369 Tulsa Oklahoma 392,138
  • 10.
    Data Visualization NikhilSrivastava, 2015
  • 11.
    Data Visualization NikhilSrivastava, 2015 • Which is the most populous city in the list? • Which state in the list has the most cities? • Which state in the list has the largest average city?
  • 12.
    Data Visualization NikhilSrivastava, 2015 • Which is the most populous city in the list? • Which state in the list has the most cities? • Which state in the list has the largest average city? • What is the population of Montgomery, Alabama?
  • 13.
    Data Visualization NikhilSrivastava, 2015 Data Visualization is: • Useful – Answers user questions – Reduces user workload (by design, not by default)
  • 14.
    Data Visualization NikhilSrivastava, 2015 Anscombe’s quartet (1973)
  • 15.
    Data Visualization NikhilSrivastava, 2015 Anscombe’s quartet (1973)
  • 16.
    Data Visualization NikhilSrivastava, 2015 Data Visualization is: • Useful – Understand structure and patterns – Resolve ambiguity – Locate outliers
  • 17.
    Data Visualization NikhilSrivastava, 2015
  • 18.
    Data Visualization NikhilSrivastava, 2015 Data Visualization is: • Important – Design decisions affect interpretation
  • 19.
    Data Visualization NikhilSrivastava, 2015 Crimean War Deaths Florence Nightingale, 1858 (re-colorized)
  • 20.
    Data Visualization NikhilSrivastava, 2015 Gapminder Foundation
  • 21.
    Data Visualization NikhilSrivastava, 2015 Data Visualization is: • Powerful – Communicate, teach, inspire
  • 22.
    Data Visualization NikhilSrivastava, 2015 purpose communicate explore, analyze data type numerical, categorical text, maps, graphs, networks method static representation animation, interactivity Our Focus
  • 23.
    Data Visualization NikhilSrivastava, 2015 • What is Data Visualization? • Thinking and Seeing • From Data to Graphics • Principles and Guidelines • Building Visualizations • Advanced introduction foundation & theory building blocks design & critique construction
  • 24.
    Data Visualization NikhilSrivastava, 2015 The Hardware
  • 25.
    Data Visualization NikhilSrivastava, 2015 The Software • High-level concepts: objects, symbols • Involves working memory • Slower, serial, conscious • Sensory input • Low-level features: orientation, shape, color, movement • Rapid, parallel, automatic Visual Perception “Bottom-up”
  • 26.
    Data Visualization NikhilSrivastava, 2015 The Software • High-level concepts: objects, symbols • Involves working memory • Slow, sequential, conscious • Sensory input • Low-level features: orientation, shape, color, movement • Rapid, parallel, automatic “Bottom-up” “Top-down” Visual Perception
  • 27.
    Data Visualization NikhilSrivastava, 2015 Task: Counting How many 3’s? 1281768756138976546984506985604982826762 9809858458224509856458945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686
  • 28.
    Data Visualization NikhilSrivastava, 2015 Task: Counting How many 3’s? 1281768756138976546984506985604982826762 9809858458224509856358945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686 1281768756138976546984506985604982826762 9809858458224509856358945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686
  • 29.
    Data Visualization NikhilSrivastava, 2015 Task: Counting Slow, sequential, conscious Rapid, parallel, automatic 1281768756138976546984506985604982826762 9809858458224509856358945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686 1281768756138976546984506985604982826762 9809858458224509856358945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686
  • 30.
    Data Visualization NikhilSrivastava, 2015 Task: (Distracted) Search Which side has the red circle?
  • 31.
    Data Visualization NikhilSrivastava, 2015 Task: (Distracted) Search Which side has the red circle?
  • 32.
    Data Visualization NikhilSrivastava, 2015 Task: (Distracted) Search Which side has the red circle?
  • 33.
    Data Visualization NikhilSrivastava, 2015 Task: (Distracted) Search Which side has the red circle?
  • 34.
    Data Visualization NikhilSrivastava, 2015 Task: (Distracted) Search Slow, sequential, conscious Rapid, parallel, automatic
  • 35.
    Data Visualization NikhilSrivastava, 2015 Task: (Distracted) Search
  • 36.
    Data Visualization NikhilSrivastava, 2015 Task: (Distracted) Search
  • 37.
    Data Visualization NikhilSrivastava, 2015 Task: (Distracted) Search
  • 38.
    Data Visualization NikhilSrivastava, 2015 Task: (Distracted) Search Slow, sequential, conscious Rapid, parallel, automatic (n=7) (n=5) (n=3)
  • 39.
    Data Visualization NikhilSrivastava, 2015 Lessons for Visualization • Use “pre-attentive” attributes when possible – Color, shape, orientation (depth, motion) – Faster, higher bandwidth • Caveats – Beware limits of working memory (<7) – Be careful mixing attributes
  • 40.
    Data Visualization NikhilSrivastava, 2015 Example: Inefficient Attributes
  • 41.
    Data Visualization NikhilSrivastava, 2015 Example: Too Many Attributes
  • 42.
    Data Visualization NikhilSrivastava, 2015 • What is Data Visualization? • Thinking and Seeing • From Data to Graphics • Principles and Guidelines • Building Visualizations • Advanced introduction foundation & theory building blocks design & critique construction
  • 43.
    Data Visualization NikhilSrivastava, 2015 What kind of data do we have? How can we represent the data visually? How can we organize this into a visualization? Visual Encoding
  • 44.
    Data Visualization NikhilSrivastava, 2015 Data Types CATEGORICAL ORDINAL NUMERICAL Interval Ratio Male / Female Asia / Africa / Europe True / False Small / Med / Large Low / High Yes / Maybe / No Latitude/Longitude Compass direction Time (event) Length Count Time (duration) = = = = < > < > < > - + - * /
  • 45.
    Data Visualization NikhilSrivastava, 2015 Data Types CATEGORICAL ORDINAL NUMERICAL Interval Ratio Male / Female Asia / Africa / Europe True / False Small / Med / Large Low / High Yes / Maybe / No Latitude/Longitude Compass direction Time (event) Length Count Time (duration) Bin/Categorize Difference/Normalize
  • 46.
    Data Visualization NikhilSrivastava, 2015 Data Types (Advanced) • Networks/Graphs – Hierarchies/Trees • Text • Maps: points, regions, routes
  • 47.
    Data Visualization NikhilSrivastava, 2015 What kind of data do we have? How can we represent the data visually? How can we organize this into a visualization? Visual Encoding
  • 48.
    Data Visualization NikhilSrivastava, 2015 Visual Encodings Marks point line area volume Channels position size shape color angle/tilt
  • 49.
    Data Visualization NikhilSrivastava, 2015 Channel Effectiveness
  • 50.
    Data Visualization NikhilSrivastava, 2015 Channel Effectiveness “Spatial position is such a good visual coding of data that the first decision of visualization design is which variables get spatial encoding at the expense of others”
  • 51.
    Data Visualization NikhilSrivastava, 2015 What kind of data do we have? How can we represent the data visually? How can we organize this into a visualization? Athi River Machakos 139,380 Awasi Kisumu 93,369 Kangundo-Tala Machakos 218,557 Karuri Kiambu 129,934 Kiambu Kiambu 88,869 Kikuyu Kiambu 233,231 Kisumu Kisumu 409,928 Kitale Trans-Nzoia 106,187 Kitui Kitui 155,896 Limuru Kiambu 104,282 Machakos Machakos 150,041 Molo Nakuru 107,806 Mwingi Kitui 83,803 Naivasha Nakuru 181,966 Nakuru Nakuru 307,990 Nandi Hills Trans-Nzoia 73,626
  • 52.
    Data Visualization NikhilSrivastava, 2015 type mark channel data represented Scatter Plot point position 2 quantitative
  • 53.
    Data Visualization NikhilSrivastava, 2015 type mark channel data represented Scatter + Hue point position, color 2 quantitative, 1 categorical
  • 54.
    Data Visualization NikhilSrivastava, 2015 type mark channel data represented Scatter + Size (“Bubble”) point position, size 3 quantitative
  • 55.
    Data Visualization NikhilSrivastava, 2015 Scatter Plot – Applications RELATIONSHIP GROUPING OUTLIERS
  • 56.
    Data Visualization NikhilSrivastava, 2015 Scatter Plot – Dangers OCCLUSION (DENSITY) OCCLUSION (OVERLAP) 3-D
  • 57.
    Data Visualization NikhilSrivastava, 2015 type mark channel data represented Line Chart line position (orientation) 2 quantitative
  • 58.
    Data Visualization NikhilSrivastava, 2015 type mark channel data represented Area Chart area size (length) 2 quantitative
  • 59.
    Data Visualization NikhilSrivastava, 2015 Line Chart – Applications PATTERN OVER TIME COMPARISON
  • 60.
    Data Visualization NikhilSrivastava, 2015 Line Chart – Dangers Y SCALING X SCALING OVERLOAD
  • 61.
    Data Visualization NikhilSrivastava, 2015 type mark channel data represented Bar Chart line size (length) 1 categorical, 1 quantitative
  • 62.
    Data Visualization NikhilSrivastava, 2015 type mark channel data represented Histogram line size (length) 1 ordinal/quantitative, 1 quantitative (count)
  • 63.
    Data Visualization NikhilSrivastava, 2015 Bar Chart – Applications COMPARE CATEGORIES DISTRIBUTION
  • 64.
    Data Visualization NikhilSrivastava, 2015 Bar Chart – Dangers TOO MANY CATEGORIES POORLY SORTED CATEGORIES ZERO AXIS
  • 65.
    Data Visualization NikhilSrivastava, 2015 type mark channel data represented Pie Chart area size (angle) 1 quantitative
  • 66.
    Data Visualization NikhilSrivastava, 2015 Pie Chart – Dangers AREA/ANGLE SCALE SIMILAR AREAS OVERLOAD
  • 67.
    Data Visualization NikhilSrivastava, 2015 Multi-Series: Bar “GROUPED” BAR CHART “STACKED” BAR CHART
  • 68.
    Data Visualization NikhilSrivastava, 2015 Multi-Series: Line MULTIPLE LINE STACKED AREA CHART
  • 69.
    Data Visualization NikhilSrivastava, 2015 Normalization NORMALIZED BAR NORMALIZED AREA
  • 70.
    Data Visualization NikhilSrivastava, 2015 • What is Data Visualization? • Thinking and Seeing • From Data to Graphics • Principles and Guidelines • Building Visualizations • Advanced introduction foundation & theory building blocks design & critique construction
  • 71.
    Data Visualization NikhilSrivastava, 2015 From Science to Art • Design principles* • Style guidelines* *dependent on context and objective (and author)
  • 72.
    Data Visualization NikhilSrivastava, 2015 Design Principles
  • 73.
    Data Visualization NikhilSrivastava, 2015 Design Principles • Integrity – Tell the truth with data • Effectiveness – Achieve visualization objectives • Aesthetics – Be compelling, vivid, beautiful
  • 74.
    Data Visualization NikhilSrivastava, 2015 Integrity Lie Ratio = size of effect in graphic size of effect in data
  • 75.
    Data Visualization NikhilSrivastava, 2015 Integrity
  • 76.
    Data Visualization NikhilSrivastava, 2015 Integrity “show data variation, not design variation”
  • 77.
    Data Visualization NikhilSrivastava, 2015 Effectiveness* Data/Ink Ratio = ink representing data total ink *Tufte
  • 78.
    Data Visualization NikhilSrivastava, 2015 Effectiveness* *Tufte avoid “chart junk”
  • 79.
    Data Visualization NikhilSrivastava, 2015 Avoid Chart Junk
  • 80.
    Data Visualization NikhilSrivastava, 2015 Avoid Chart Junk
  • 81.
    Data Visualization NikhilSrivastava, 2015 Avoid Chart Junk
  • 82.
    Data Visualization NikhilSrivastava, 2015 Avoid Chart Junk
  • 83.
    Data Visualization NikhilSrivastava, 2015 Avoid Chart Junk
  • 84.
    Data Visualization NikhilSrivastava, 2015 Avoid Chart Junk
  • 85.
    Data Visualization NikhilSrivastava, 2015 Effectiveness (Few)
  • 86.
    Data Visualization NikhilSrivastava, 2015 Practical Guidelines • Avoid 3-D charts • Focus on substance over graphics • Avoid separate legends and keys • Use faint grids/guidelines • Avoid unnecessary textures and colors
  • 87.
    Data Visualization NikhilSrivastava, 2015 A Note on Color • To label • To emphasize • To liven or decorate
  • 88.
    Data Visualization NikhilSrivastava, 2015 Color as a Channel Categorical Quantitative Hue Good (6-8 max) Poor Value Poor Good Saturation Poor Okay
  • 89.
    Data Visualization NikhilSrivastava, 2015 Bad Color
  • 90.
    Data Visualization NikhilSrivastava, 2015 Good Color
  • 91.
    Data Visualization NikhilSrivastava, 2015 More Color Guidelines • Use color only when necessary • Saturated colors for small areas, labels • Less saturated colors for large areas, backgrounds • Use tools like ColorBrewer
  • 92.
    Data Visualization NikhilSrivastava, 2015 • What is Data Visualization? • Thinking and Seeing • From Data to Graphics • Principles and Guidelines • Building Visualizations • Advanced introduction foundation & theory building blocks design & critique construction
  • 93.
    Data Visualization NikhilSrivastava, 2015 What Tools to Use? Athi River Machakos 139,380 Awasi Kisumu 93,369 Kangundo-Tala Machakos 218,557 Karuri Kiambu 129,934 Kiambu Kiambu 88,869 Kikuyu Kiambu 233,231 Kisumu Kisumu 409,928 Kitale Trans-Nzoia 106,187 Kitui Kitui 155,896 Limuru Kiambu 104,282 Machakos Machakos 150,041 Molo Nakuru 107,806 Mwingi Kitui 83,803 Naivasha Nakuru 181,966 Nakuru Nakuru 307,990 Nandi Hills Trans-Nzoia 73,626 Clean Restructure Explore Analyze DATA Visualization Goals
  • 94.
    Data Visualization NikhilSrivastava, 2015 Visualization Tools Excel Tableau Plotly Python R Matlab Ubiq/Silk How hard is it to learn? How powerful & flexible is it? I’ll have to write code
  • 95.
    Data Visualization NikhilSrivastava, 2015 Visualization Tools Excel Tableau Plotly Python R Matlab Ubiq/Silk How hard is it to learn? How powerful & flexible is it? Google Charts Highcharts d3 I’ll have to write code
  • 96.
    Data Visualization NikhilSrivastava, 2015 Cheat Sheets • For Hackathon participants • Otherwise, email me
  • 97.
    Data Visualization NikhilSrivastava, 2015 • What is Data Visualization? • Thinking and Seeing • From Data to Graphics • Principles and Guidelines • Building Visualizations • Advanced introduction foundation & theory building blocks design & critique construction
  • 98.
    Data Visualization NikhilSrivastava, 2015 Small Multiples
  • 99.
    Data Visualization NikhilSrivastava, 2015 Treemap (Hierarchical Data) Strengths: nested relationships Concerns: order, aspect ratio
  • 100.
    Data Visualization NikhilSrivastava, 2015 Multi-Level Pie Chart (Hierarchical Data) Strengths: nested relationships Concerns: readability
  • 101.
    Data Visualization NikhilSrivastava, 2015 Heat Map (Table/Field Data) Strengths: pattern/outlier detection Concerns: ordering, clustering, color
  • 102.
    Data Visualization NikhilSrivastava, 2015 Choropleth (Region Data) Strengths: geography Concerns: region size color
  • 103.
    Data Visualization NikhilSrivastava, 2015 Cartogram (Region Data) Strengths: geographic pattern Concerns: base map knowledge
  • 104.
    Data Visualization NikhilSrivastava, 2015 The Ebb and Flow of Movies NY Times, 2008 Streamgraph
  • 105.
    Data Visualization NikhilSrivastava, 2015 “Data Visualization” Wikipedia Page Wordle Word Cloud
  • 106.
    Data Visualization NikhilSrivastava, 2015
  • 107.
    Data Visualization NikhilSrivastava, 2015 Twitter Networks PJ Lamberson, 2012
  • 108.
    Data Visualization NikhilSrivastava, 2015 Blogs/Reference • Infosthetics.com • Visualizing.org • FlowingData.com
  • 109.
    Data Visualization NikhilSrivastava, 2015 Nikhil Srivastava nsri@wharton.upenn.edu

Editor's Notes

  • #7 Alright, let’s get started – what is data visualization?
  • #8 It’s difficult to define precisely: as a field, DV has many related and overlapping goals and descriptions. It is often used interchangeably with different terms, and it falls under many different disciplines.
  • #9 Better than a definition is an example. Let’s take a look at this table of Kenyan cities showing city name, county name, and city population. Take a moment to understand the structure of this data, because I’m about to ask you a few questions on it.
  • #10 Better than a definition is an example. Let’s take a look at this table of Kenyan cities showing city name, county name, and city population. Take a moment to understand the structure of this data, because I’m about to ask you a few questions on it.
  • #11 Now, let’s answer the same questions by using the visualization. What are the cognitive steps required? How easy or difficult is the process?
  • #12 Now, let’s answer the same questions by using the visualization. What are the cognitive steps required? How easy or difficult is the process?
  • #13 Now let’s ask an additional question we didn’t ask before.
  • #14 We’ve learned that data visualization can be useful in telling us things about a set of data, making it easier to find information and answer questions. We’ve also learned that this usefulness depends both on the design of the visualization and the specific information we are looking for.
  • #15 Let’s take a look at another example. This is a data set called Anscombe’s Quartet, named after the statistician who devised it. It consists of four separate sets of data, each of which is a list of ten pairs of numbers. So there are ten different X and Y values that are paired. To make this a bit more concrete, you can imagine that each data set describes ten people, X represents their height and Y represents their weight. The interesting thing is that all four of these data sets have exactly the same relationship between the X and Y numbers. All X values have the same average and standard deviation, and so do all Y values. Furthermore, the correlation between X and Y is the same for all sets. And except for the last one (which has a bunch of 8s), there’s not much we can do to distinguish them or describe them meaningfully by just looking at the numbers in the table. Now let’s see what happens when we plot them.
  • #16 Here we’ve visualized the data in what’s known as a scatter plot. Each dot represents one of the ten pairs, located on the horizontal axis by X value and on the vertical axis by Y value. By visualizing the data, we see patterns, outliers, and relationships that were impossible to detect in the chart.
  • #17 So we’ve learned that DV is important. It can help us resolve ambiguous data, locate outliers, and generally understand the structure and pattern of a data set.
  • #20 Infographic of twitter activity in Africa in late 2013 produced by Portland Communications.
  • #21 Interactive tool from the Gapminder Foundation animating the health and wealth of world countries over time. This screenshot shows the historical path of Kenya from 1800 to 2013. Note the number of data types (life expectancy, GDP, population per country and year) and variety of visual encodings (x- and y- position, size, color, time).
  • #24 Alright, let’s get started – what is data visualization?
  • #51 Readings in Information Visualization: Using Vision to Think (Ben Schneiderman et al, 1999)