Session 4: Data Visualization
R for Stata Users
Luiza Andrade, Rob Marty, Rony Rodriguez-Ramirez, Luis Eduardo San Martin, Leonardo Viotti, Marc-
Andrea Fiorina
The World Bank – DIME | WB Github
March 2023
Introduction
               2 / 56
Introduction
Initial Setup
 If You Attended Session 2   If You Did Not Attend Session 2
 1. Go to the dime-r-training-mar2023 folder that you created yesterday, and open the dime-r-training-mar2023 R project
    that you created there.
                                                                                                                      3 / 56
Introduction
Initial Setup
 If You Attended Session 2    If You Did Not Attend Session 2
 1. Create a folder named dime-r-training-mar2023 in your preferred location in your computer.
 2. Go to the OSF page of the course and download the file in: R for Stata Users - 2023 March > Data > dime-r-training-
    mar2023.zip .
 3. Unzip dime-r-training-mar2023.zip .
 4. Open the dime-r-training-mar2023 R project.
                                                                                                                     3 / 56
Today's session
                  Exploratory Analysis v. Publication/Reporting
                  Data, aesthetics, & the grammar of graphics
            Aesthetics in extra dimensions, themes, and saving plots
 For this session, you’ll use the ggplot2 package from the tidyverse meta-
 package.
 Similarly to previous sessions, you can find some references at the end of this
 presentation that include a more comprehensive discussion on data
 visualization.
                                                                                   4 / 56
Introduction
Before we start
        Make sure the packages ggplot2 are installed and loaded. You can load it directly using library(tidyverse) or
        library(ggplot2)
        Load the whr_panel data set (remember to use the here package) we created last week.
# Packages
library(tidyverse)
library(here)
whr_panel <- read_csv(
  here(
    "DataWork", "DataSets", "Final", "whr_panel.csv"
    )
)
## Rows: 470 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): country, region
## dbl (6): year, happiness_rank, happiness_score, economy_gdp_per_capita, heal...                                      5 / 56
Introduction
In our workflow there are usually two distinct uses for plots:
  1. Exploratory analysis: Quickly visualize your data in an insightful way.
         Base R can be used to quickly create basic figures
         We will also use ggplot2 to quickly create basic figures as well.
  2. Publication/Reporting: Make pretty graphs for a presentation, a project report, or papers:
         We’ll do this using ggplot2 with more customization. The idea is to create beautiful graphs.
                                                                                                        6 / 56
Exploratory Analysis
                       7 / 56
Plot with Base R
First, we’re going to use base plot, i.e., using Base R default libraries. It is easy to use and can produce useful graphs with very
few lines of code.
    Exercise 1: Exploratory Analysis.
    (1) Create a vector called vars with the variables: economy_gdp_per_capita , happiness_score ,
     health_life_expectancy , and freedom .
    (2) Select all the variables from the vector vars in the whr_panel dataset and assign to the object whr_plot .
    (3) Use the plot() function: plot(whr_plot)
 # Vector of variables
 vars <- c("economy_gdp_per_capita", "happiness_score", "health_life_expectancy", "freedom")
 # Create a subset with only those variables, let's call this subset whr_plot
 whr_plot <- whr_panel %>%
   select(all_of(vars))
01:00
                                                                                                                                  8 / 56
Base Plot
plot(whr_plot)
                 9 / 56
The beauty of ggplot2
1. Consistency with the Grammar of Graphics
      This book is the foundation of several data viz applications: ggplot2 , polaris-
      tableau , vega-lite
2. Flexibility
3. Layering and theme customization
4. Community
It is a powerful and easy to use tool (once you understand its logic) that produces
complex and multifaceted plots.
                                                                                      10 / 56
ggplot2: basic structure (template)
The basic ggplot structure is:
ggplot(data = DATA) +
  GEOM_FUNCTION(mapping = aes(AESTHETIC MAPPINGS))
Mapping data to aesthetics
                                Think about colors, sizes, x and y references
              We are going to learn how we connect our data to the components of a ggplot
                                                                                            11 / 56
ggplot2: full structure
ggplot(data = <DATA>) +           1. Data : The data that you want to visualize
  <GEOM_FUNCTION>(                2. Layers : geom_ and stat_ → The geometric shapes and
     mapping = aes(<MAPPINGS>),      statistical summaries representing the data
     stat = <STAT>,
                                  3. Aesthetics : aes() → Aesthetic mappings of the
     position = <POSITION>
                                     geometric and statistical objects
  ) +z
  <COORDINATE_FUNCTION> +         4. Scales : scale_ → Maps between the data and the
  <FACET_FUNCTION> +                 aesthetic dimensions
  <SCALE_FUNTION> +               5. Coordinate system : coord_ → Maps data into the
  <THEME_FUNCTION>                   plane of the data rectangle
                                  6. Facets : facet_ → The arrangement of the data into a
                                     grid of plots
                                  7. Visual themes : theme() and theme_ → The overall
                                     visual defaults of a plot
                                                                                     12 / 56
ggplot2: decomposition
 There are multiple ways to
 structure plots with ggplot
For this presentation, I will stick to Thomas Lin
Pedersen's decomposition who is one of most
   prominent developers of the ggplot and
              gganimate package.
These components can be seen as layers, this is
  why we use the + sign in our ggplot syntax.
                                                    13 / 56
Exploratory Analysis
Let's start making some plots.
 ggplot(data = whr_panel) +
   geom_point(mapping = aes(x = happiness_score, y = economy_gdp_per_capita))
                                                                                14 / 56
Exploratory Analysis
We can also set up our mapping in the ggplot() function.
ggplot(data = whr_panel, aes(x = happiness_score, y = economy_gdp_per_capita)) +
  geom_point()
                                                                                   15 / 56
Exploratory Analysis
We can also set up the data outside the ggplot() function as follows:
whr_panel %>%
  ggplot(aes(x = happiness_score, y = economy_gdp_per_capita)) +
  geom_point()
                                                                        16 / 56
Exploratory Analysis
I prefer to use the second way of structuring our ggplot.
1. First, setting our data;
2. pipe it;
3. then aesthetics;
4. and finally the geometries.
Both structures will work but this will make a difference if you want to load more
datasets at the same time, and whether you would like to combine more geoms in
the same ggplot. More on this in the following slides.
                                                                                     17 / 56
Exploratory Analysis
 Exercise 2: Create a scatter plot with x = freedom and y = economy_gdp_per_capita .
 Solution:
  whr_panel %>%
    ggplot() +
    geom_point(aes(x = freedom, y = economy_gdp_per_capita))
01:00
                                                                                       18 / 56
Exploratory Analysis
The most common geoms are:
  geom_bar() , geom_col() : bar charts.
  geom_boxplot() : box and whiskers plots.
  geom_density() : density estimates.
  geom_jitter() : jittered points.
  geom_line() : line plots.
  geom_point() : scatter plots.
    If you want to know more about layers, you can refer to this.
                                                                    19 / 56
Exploratory Analysis
In summary, our basic plots should have the following:
whr_panel %>%                                            The data we want to plot.
  ggplot(
    aes(
       x = happiness_score,
       y = economy_gdp_per_capita
     )
   ) +
   geom_point()
                                                                                     20 / 56
Exploratory Analysis
In summary, our basic plots should have the following:
whr_panel %>%                                            Columns (variables) to use for x and y
  ggplot(
    aes(
       x = happiness_score,
       y = economy_gdp_per_capita
     )
   ) +
   geom_point()
                                                                                                  21 / 56
Exploratory Analysis
In summary, our basic plots should have the following:
whr_panel %>%                                            How the plot is going to be drawn.
  ggplot(
    aes(
       x = happiness_score,
       y = economy_gdp_per_capita
     )
   ) +
   geom_point()
                                                                                              22 / 56
Exploratory Analysis
We can also map colors.
whr_panel %>%
ggplot(
  aes(
    x = happiness_score,
    y = economy_gdp_per_capita,
    color = region
  )
) +
  geom_point()
                                  23 / 56
Exploratory Analysis
Let's try to do something different, try, instead of region , adding color = "blue" inside aes() .
       What do you think is the problem with this code?
 whr_panel %>%
   ggplot(
     aes(
           x = happiness_score,
           y = economy_gdp_per_capita,
           color = "blue"
       )
   )
                                                                                                     24 / 56
Exploratory Analysis
In ggplot2 , these settings are called aesthetics.
       "Aesthetics of the geometric and statistical objects".
We can set up:
     position : x, y, xmin, xmax, ymin, ymax, etc.
     colors : color and fill.
     transparency : alpha.
     sizes : size and width.
     shapes : shape and linetype.
Notice that it is important to know where we are setting our aesthetics. For example:
     geom_point(aes(color = region)) to color points based on the variable region
     geom_point(color = "red") to color all points in the same color.
                                                                                        25 / 56
Exploratory Analysis
Let's modify our last plot. Let's add color = "blue" inside geom_point() .
 whr_panel %>%
   ggplot(
     aes(
       x = happiness_score,
       y = economy_gdp_per_capita
     )
   ) +
   geom_point(color = "blue")
                                                                             26 / 56
Exploratory Analysis
 Exercise 3: Map colors per year for the freedom and gdp plot we did before. Keep in mind the type of the variable
 year .
 Solution:
  whr_panel %>%
    ggplot(
      aes(
          x = freedom,
          y = economy_gdp_per_capita,
          color = year
      )
    ) +
    geom_point()
                                                                                                              01:00
                                                                                                                     27 / 56
Exploratory Analysis
How do you think we could solve it?
   Change the variable year as: as.factor(year) .
whr_panel %>%
  ggplot(
    aes(
      x = freedom,
      y = economy_gdp_per_capita,
      color = as.factor(year)
    )
  ) +
  geom_point()
                                                    28 / 56
ggplot2: settings
                    29 / 56
ggplot2: settings
Now, let's try to modify our plots. In the following slides, we are going to:
1. Change shapes.
2. Include more geoms.
3. Separate by regions.
4. Pipe and mutate before plotting.
5. Change scales.
6. Modify our theme.
                                                                                30 / 56
ggplot2: shapes
whr_panel %>%
  ggplot(aes(x = happiness_score, y = economy_gdp_per_capita)) +
  geom_point(shape = 5)
                                                                   31 / 56
ggplot2: shapes
                  32 / 56
ggplot2: including more geoms
whr_panel %>%
  ggplot(aes(x = happiness_score,   y = economy_gdp_per_capita)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
                                                                     33 / 56
ggplot2: Facets
whr_panel %>%
  ggplot(aes(x = happiness_score,   y = economy_gdp_per_capita)) +
  geom_point() +
  facet_wrap(~ region)
                                                                     34 / 56
ggplot2: Colors and facets
 Exercise 4: Use the last plot and add a color aesthetic per region.
 Solution:
  whr_panel %>%
    ggplot(
      aes(
          x = happiness_score,
          y = economy_gdp_per_capita,
          color = region
      )
    ) +
    geom_point() +
    facet_wrap(~ region)
01:00
                                                                       35 / 56
ggplot2: Pipe and mutate before plotting
Let's imagine now, that we would like to transform a variable before plotting.
  R Code     Plot
 whr_panel %>%
   mutate(
     latam = ifelse(region == "Latin America and Caribbean", TRUE, FALSE)
   ) %>%
   ggplot(
     aes(
       x = happiness_score, y = economy_gdp_per_capita,
       color = latam)
   ) +
   geom_point()
                                                                                 36 / 56
ggplot2: Pipe and mutate before plotting
Let's imagine now, that we would like to transform a variable before plotting.
  R Code     Plot
                                                                                 36 / 56
ggplot2: geom's sizes
We can also specify the size of a geom, either by a variable or just a number.
whr_panel %>%
  filter(year == 2017) %>%
  ggplot(aes(x = happiness_score, y = economy_gdp_per_capita)) +
  geom_point(aes(size = economy_gdp_per_capita))
                                                                                 37 / 56
ggplot2: Changing scales
 Linear   Log
ggplot(data = whr_panel,
      aes(x = happiness_score,
          y = economy_gdp_per_capita)) +
 geom_point()
                                           38 / 56
ggplot2: Changing scales
 Linear   Log
ggplot(data = whr_panel,
      aes(x = happiness_score,
          y = economy_gdp_per_capita)) +
 geom_point() +
 scale_x_log10()
                                           38 / 56
ggplot2: Themes
Let's go back to our plot with the latam dummy.
We are going to do the following to this plot:
1. Filter only for the year 2015.
2. Change our theme.
3. Add correct labels.
4. Add some annotations.
5. Modify our legends.
                                                  39 / 56
ggplot2: Labels
 R Code     Plot
whr_panel %>%
 mutate(
   latam = ifelse(region == "Latin America and Caribbean", TRUE, FALSE)
 ) %>%
 filter(year == 2015) %>%
 ggplot(aes(x = happiness_score, y = economy_gdp_per_capita,
            color = latam)) +
 geom_point() +
 labs(
   x = "Happiness Score",
     y = "GDP per Capita",
     title = "Happiness Score vs GDP per Capita, 2015"
 )
                                                                          40 / 56
ggplot2: Labels
R Code   Plot
                  40 / 56
ggplot2: Legends
 R Code     Plot
whr_panel %>%
 mutate(
   latam = ifelse(region == "Latin America and Caribbean", TRUE, FALSE)
 ) %>%
 filter(year == 2015) %>%
 ggplot(aes(x = happiness_score, y = economy_gdp_per_capita,
            color = latam)) +
 geom_point() +
 scale_color_discrete(labels = c("No", "Yes")) +
 labs(
     x = "Happiness Score",
     y = "GDP per Capita",
     color = "Country in Latin America\nand the Caribbean",
     title = "Happiness Score vs GDP per Capita, 2015"
 )
                                                                          41 / 56
ggplot2: Legends
R Code   Plot
                   41 / 56
ggplot2: Themes
 R Code    Plot
whr_panel %>%
 mutate(
   latam = ifelse(region == "Latin America and Caribbean", TRUE, FALSE)
 ) %>%
 filter(year == 2015) %>%
 ggplot(aes(x = happiness_score, y = economy_gdp_per_capita,
            color = latam)) +
 geom_point() +
 scale_color_discrete(labels = c("No", "Yes")) +
 labs(
   x = "Happiness Score",
   y = "GDP per Capita",
   color = "Country in Latin America\nand the Caribbean",
   title = "Happiness Score vs GDP per Capita, 2015"
 ) +
 theme_minimal()
                                                                          42 / 56
ggplot2: Themes
R Code   Plot
                  42 / 56
ggplot2: Themes
The theme() function allows you to modify each aspect of your plot. Some arguments are:
theme(
    # Title and text labels
    plot.title = element_text(color, size, face),
     # Title font color size and face
     legend.title = element_text(color, size, face),
     # Title alignment. Number from 0 (left) to 1 (right)
     legend.title.align = NULL,
     # Text label font color size and face
     legend.text = element_text(color, size, face),
     # Text label alignment. Number from 0 (left) to 1 (right)
     legend.text.align = NULL,
)
More about these modification can be found here
                                                                                          43 / 56
ggplot2: Color palettes
We can also add color palettes using other packages such as: RColorBrewer,
viridis or funny ones like the wesanderson package. So, let's add new colors.
   First, install the RColorBrewer package.
# install.packages("RColorBrewer")
library(RColorBrewer)
   Let's add scale_color_brewer(palette = "Dark2") to our ggplot.
                                                                                44 / 56
ggplot2: Color palettes
 R Code    Plot
whr_panel %>%
 mutate(
   latam = ifelse(region == "Latin America and Caribbean", TRUE, FALSE)
 ) %>%
 filter(year == 2015) %>%
 ggplot(aes(x = happiness_score, y = economy_gdp_per_capita,
            color = latam)) +
 geom_point() +
 scale_color_brewer(palette = "Dark2", labels = c("No", "Yes")) +
 labs(
   x = "Happiness Score",
   y = "GDP per Capita",
   color = "Country in Latin America\nand the Caribbean",
   title = "Happiness Score vs GDP per Capita, 2015"
 ) +
 theme_minimal()
                                                                          45 / 56
ggplot2: Color palettes
 R Code   Plot
                          45 / 56
ggplot2: Color palettes
My favorite color palettes packages:
1. ghibli
2. LaCroixColoR
3. NineteenEightyR
4. nord
5. palettetown
6. quickpalette
7. wesanderson
                                       46 / 56
Saving a plot
                47 / 56
Saving a plot
Remember that in R we can always assign our functions to an object. In this case, we can assign our ggplot2 code to an
object called fig as follows.
 fig <- whr_panel %>%
   mutate(
     latam = ifelse(region == "Latin America and Caribbean", TRUE, FALSE)
   ) %>%
   filter(year == 2015) %>%
   ggplot(aes(x = happiness_score, y = economy_gdp_per_capita,
              color = latam)) +
   geom_point() +
   scale_color_discrete(labels = c("No", "Yes")) +
   labs(
     x = "Happiness Score",
     y = "GDP per Capita",
     color = "Country in Latin America\nand the Caribbean",
     title = "Happiness Score vs GDP per Capita, 2015"
   ) +
   theme_minimal()
                                                                                                                         48 / 56
Therefore, if you want to plot it again, you can just type fig in the console.
Saving a plot
 Exercise 5: Save a ggplot under the name fig_*YOUR INITIALS* . Use the ggsave() function. You can either include
 the function after your plot or, save the ggplot first as an object and then save the plot.
 The syntax is ggsave(OBJECT, filename = FILEPATH, heigth = ..., width = ..., dpi = ...) .
 Solution:
  ggsave(
    fig,
    filename = here("DataWork","Output","Raw","fig_MA.png"),
    dpi = 750,
      scale = 0.8,
      height = 8,
      width = 12
  )
01:00
                                                                                                                    49 / 56
And that's it for this session. Join us tomorrow for data
    analysis. Remember to submit your feedback!
                                                       50 / 56
References and recommendations
                                 51 / 56
References and recommendations
 ggplot tricks:
     Tricks and Secrets for Beautiful Plots in R by Cédric Scherer: https://github.com/z3tt/outlierconf2021
 Websites:
     Interactive stuff : http://www.htmlwidgets.org/
     The R Graph Gallery: https://www.r-graph-gallery.com/
     Gpplot official site: http://ggplot2.tidyverse.org/
 Online courses:
     Johns Hopkins Exploratory Data Analysis at Coursera: https://www.coursera.org/learn/exploratory-data-analysis
 Books:
     The grammar of graphics by Leland Wilkinson.
     Beautiful Evidence by Edward Tufte.
     R Graphics cook book by Winston Chang
     R for Data Science by Hadley Wickham andGarrett Grolemund
                                                                                                                     52 / 56
Appendix: interactive graphs
                               53 / 56
Interactive graphs
There are several packages to create interactive or dynamic data vizualizations with R. Here are a few:
     leaflet - R integration tp one of the most popular open-source libraries for interactive maps.
     highcharter - cool interactive graphs.
     plotly - interactive graphs with integration to ggplot.
     gganimate - ggplot GIFs.
     DT - Interactive table
These are generally, html widgets that can be incorporated in to an html document and websites.
                                                                                                          54 / 56
Interactive graphs
Now we’ll use the ggplotly() function from the plotly package to create an interactive graph!
Extra exercise: Interactive graphs.
    Load the plotly package
    Pass that object with the last plot you created to the ggplotly() function
                                                                                                55 / 56
Interactive graphs
 R Code    Plot
# Load package
library(plotly)
# Use ggplotly to create an interactive plot
ggplotly(fig) %>%
  layout(legend = list(orientation = "h", x = 0.4, y = -0.2))
                                                                56 / 56
Interactive graphs
                 R Code         Plot
                       Happiness Score vs GDP per Capita, 2015
                 1.5
GDP per Capita
                 1.0
                 0.5
                 0.0
                               3                      4                  5                          6          7
                                                                       Happiness Score
                                                                 Country in Latin America   FALSE       TRUE
                                                                 and the Caribbean
                                                                                                                   56 / 56