DATA VISUALIZATION
WITH
PRINCIPLES & PRACTICE
2
HELLO!
I am Elijah Appiah from
Ghana.
I am an Economist by
profession.
I love everything data, so I
love R!
You can reach me:
secret behind the smile! eappiah.uew@gmail.com
3
Lesson Goals
Provide compact introduction to
allow readers learn about
visualization techniques.
Emphasize the strong connections
between visualizations and insight.
5
Datasets
mtcars: base R
wage1: wooldridge package
diamonds: ggplot2 package
5
Variables
Categorical Numeric
Nominal – names, labels, Discrete – counts
categories with no natural e.g. number of
order cylinders of a vehicle
e.g. gender, countries
Ordinal – categories with Continuous – measured
an order even within an interval
e.g. Likert Scales e.g. height, weight
6
Variables (e.m)
Discrete – represents counts
e.g. number of students, grade levels, gender,
number of blue marbles in a jar, etc.
Continuous – represents measurable
amounts
e.g. height, weight, temperature, distance, etc.
7
GGPLOT2
GRAMMAR OF GRAPHICS PLOTS
8
GGPLOT2 LAYERS
Dataset to be Visual elements Data representations Plot appearance
visualized for the data to aid understanding (all non-data ink)
DATA AESTHETICS GEOMETRIES FACETS STATISTICS COORDINATES THEMES
Scales onto Create multiple The space on
which data is plots which data is
mapped plotted
9
GGPLOT2 LAYERS
Dataset to be Visual elements
visualized for the data
DATA AESTHETICS GEOMETRIES
Scales onto
which data is
mapped
10
GGPLOT2
The package is
ggplot2
The function is
ggplot()
11
Layer: DATA
ggplot(data = df)
Blank canvas with
grey background
12
Layer: AESTHETICS
The aesthetic attributes include:
x, y, colour (or color), shape, size, fill, alpha,
etc…
Aesthetics are mapped in the aes() function in the
ggplot() function.
13
Layer: AESTHETICS
ggplot(data = df, mapping = aes())
Aesthetic
attributes
14
Layer: AESTHETICS
ggplot(data = df, aes())
Aesthetic
attributes
15
Layer: AESTHETICS
ggplot(df, aes())
Aesthetic
attributes
16
Layer: AESTHETICS
ggplot(mtcars, aes(x = mpg))
17
Layer: AESTHETICS
ggplot(mtcars, aes(x = mpg, y = hp))
18
Layer: GEOMETRIES
The visual elements of plots are defined by geoms.
It is specified as geom_*().
where * denotes the specific type of plot to create.
A bar plot will be geom_bar()
A histogram will be geom_histogram()
A scatter plot will be geom_point()
Don’t worry…….we will be going into details soon……
19
Layer: GEOMETRIES
The geometric objects (or geoms) are added (+) to the
ggplot() function.
Example:
ggplot(mtcars, aes(x = mpg)) +
geom_histogram()
ggplot(mtcars, aes(x = mpg, y = hp)) +
geom_point()
20
Layer: GEOMETRIES
ggplot(mtcars, aes(x = mpg)) + geom_histogram()
21
Layer: GEOMETRIES
ggplot(mtcars, aes(x = mpg, y = hp)) + geom_point()
22
Now, let’s practice
23
VARIABLES & PLOTS - GEOMS
ONE VARIABLE
Discrete Continuous
Bar Plot – geom_bar() Histogram – geom_histogram()
Density Plot – geom_density()
Dot plot – geom_dot()
Frequency Polygons – geom_freqpoly()
24
VARIABLES & PLOTS - GEOMS
geom_bar() – display distribution of discrete variables.
geom_histogram() – bin and count continuous variable, display
with bars.
geom_density() – smoothed density estimate.
geom_dotplot() – stack individual points into a dot plot.
geom_freqpoly() – bin and count continuous variable, display
with lines.
25
Now, let’s practice
26
VARIABLES & PLOTS - GEOMS
TWO VARIABLES
Both Continuous One Continuous,
One Discrete
Scatter plot – geom_point() Bar plot – geom_col() or
geom_bar(stat=“identity”)
Quantile plot – geom_quantile() Box Plot – geom_boxplot()
Rug plot – geom_rug() Violin plot – geom_violin()
Text labels – geom_text()
27
VARIABLES & PLOTS - GEOMS
geom_point() – scatterplot.
geom_quantile() – smoothed quantile regression.
geom_rug() – marginal rug plots.
geom_text() – text labels.
geom_col()/geom_bar(stat=“identity”) – bar chart of
precomputed summaries.
geom_boxplot() – boxplots.
geom_violin() – show density of values in each group.
28
Now, let’s practice
29
VARIABLES & PLOTS - GEOMS
TWO VARIABLES
At Least One Show Distribution
Discrete (continuous)
Count plot– geom_count() Hexagonal Heatmap – geom_hex()
Jitter plot– geom_jitter() Heatmap – geom_bin2d()
Density plot – geom_density2d()
30
VARIABLES & PLOTS - GEOMS
geom_count() – count number of points at distinct locations.
geom_jitter() – randomly jitter overlapping points.
geom_hex() – bin into hexagons and count.
geom_bin2d() – smoothed 2d density estimate.
geom_density2d() – smoothed 2d density estimate.
31
Now, let’s practice
32
VARIABLES & PLOTS - GEOMS
TWO VARIABLES
One Time, One Display Uncertainty
Continuous
Line plot – geom_line() geom_crossbar()
Area plot – geom_area() geom_errorbar()
Step plot – geom_step() geom_linerange()
geom_pointrange()
33
VARIABLES & PLOTS - GEOMS
geom_line() – line plot.
geom_area() – area plot.
geom_step() – step plot.
geom_crossbar() – vertical bar with center.
geom_errorbar() – error bars.
Geom_linerange() – vertical line.
geom_pointrange() – vertical line with center.
34
Now, let’s practice
35
VARIABLES & PLOTS - GEOMS
TWO VARIABLES
geom_map() – for map data
THREE VARIABLES
geom_contour() – contours.
geom_tile() – tile the plane with rectangles.
geom_raster() – equal sized tiles (fast version of
geom_tile())
36
THANKS!
Any questions?
You can find me at: eappiah.uew@gmail.com