A grammar for graphics
Taxonomy for understanding data graphics.
●
Visual Cues
●
Coordinate Systems
●
Scale
“ggplot2 package can be used to create data
graphics”
two-dimensional data graphics in R include base graphics and the
lattice system
But we use “ggplot2” as it provides “a grammar”—for describing
and specifying graphics
The grammar of ggplot2
●
Different functions for different kinds of visual representation.
●
Geoms – these are the geometric objects. Do you need bars, points, lines?
add ‘geoms’ – graphical representations of the data in the plot (points, lines,
bars). ggplot2 offers many different geoms; some common ones includes:
●
geom_point() for scatter plots, dot plots, etc.
●
geom_boxplot() for, well, boxplots!
●
geom_line() for trend lines, time series, etc.
●
geom_col() for making columns
The grammar of ggplot2
ggplot(mydata100, aes(x = factor(""), fill = workshop) ) +
geom_bar()
ggplot(mydata100,
aes(x = factor(""), fill = workshop) ) +
geom_bar() +
coord_polar(theta = "y") +
scale_x_discrete("")
The grammar of ggplot2
●
Aesthetics(aes()): We typically understand aesthetics as how
something looks, color, size etc. Aesthetics do not refer how something
looks in R
●
these are the roles that the variables play in each graph. A variable may
control where points appear,
●
the color or shape of a point, the height of a bar and so on.
●
map a variable to a visual cue.
●
aes(y = gdp, x = educ)
●
aes(label = country, color = net_users)
●
ggplot(data, aes(x=distance, y= dep_delay)) +
●
geom_point(color=blue)
Aesthetics
g <- ggplot(data = CIACountries, aes(y = gdp, x = educ))
g + geom_point(size = 3)
g + geom_text(aes(label = country, color = net_users), size = 3)
g + geom_point(aes(color = net_users, size = roadways))
The grammar of ggplot2
Scales: these are legends that show things like circular symbols
represent females while circles represent males.
ggplot(mydata100,
aes(x = factor(""), fill = workshop) ) +
geom_bar() +
coord_polar(theta = "y") +
scale_x_discrete("")
●
scale_x_continuous(), scale_x_discrete(), scale_color(),
scale_y_continous()
The grammar of ggplot2
Guides: Context is provided by guides (more commonly called
legends).
A guide helps a human reader understand the meaning of the visual cues by
providing context.
For position visual cues, the most common sort of guide is the familiar
axis with its tick marks and labels.
legends relate how dot color corresponds to different variables
Functions: geom_text() and geom_label()
The grammar of ggplot2
Using multiple aesthetics such as shape, color, and size to display multiple
variables can produce a confusing, hard-to-read graph
●
Facets: multiple side-by-side graphs used to display levels of a categorical
variable—provide a simple and effective alternative.
facet_wrap(): creates a facet for each level of a single categorical variable,
facet_grid(): creates a facet for each combination of two categorical variables,
arranging them in a grid.
g+
geom_point(alpha = 0.9, aes(size = roadways)) +
coord_trans(y = "log10") +
facet_wrap(~net_users, nrow = 1) +
theme(legend.position = "top")
Canonical data graphics in R
●
Univariate displays: how a single variable is distributed
●
variable is numeric, then its distribution is commonly summarized
graphically using a histogram or density plot.
●
g <- ggplot(data = SAT_2010, aes(x = math))
●
g + geom_histogram(binwidth = 10) + labs(x = "Average math SAT
score")
Canonical data graphics in R
g + geom_density(adjust = 0.3)
variable is categorical,a bar graph to display the distribution of a categorical variable
ggplot(
data = head(SAT_2010, 10),
aes(x = reorder(state, math), y = math)
)+
geom_col() +
labs(x = "State", y = "Average math SAT score")
Canonical data graphics in R
●
Multivariate displays: most effective way to convey the
relationship between more than one variable.
●
distribution is commonly summarized graphically using a scatter plot.
●
g <- ggplot(
data = SAT_2010,
aes(x = expenditure, y = math)
)+
geom_point()
g + aes(color = SAT_rate)
g + facet_wrap(~
SAT_rate)
Canonical data graphics in R
g + facet_wrap(~ SAT_rate)
Canonical data graphics in R
Maps: Geographically
distributed data
Canonical data graphics in R
Networks:is a set of connections, called edges, between
nodes, called vertices. A vertex represents an entity. The
edges indicate pairwise relationships between those
entities.