Week 2
Reproducibility in Practice
Git and GitHub
Git and GitHub tips
• Git is a version control system, similar to “Track Changes”
features from Microsoft Word.
• GitHub is the home for your Git-based projects on the internet (like
DropBox but much better).
• There are a lot of Git commands and very few people know them
all; most of the time you will use
git add
git commit
git push
git pull
All of these commands can be executed through RStudio's
Git tab.
Git and GitHub tips
We will be using git and interfacing with GitHub through RStudio
If you Google for help you might come across methods for
doing these
things in the command line -- skip that and move on to the next
resource unless you feel comfortable trying it out.
There is a great resource for working with git and R:
happygitwithr.com.
Some of the content in there is beyond the scope of this course,
but it's a good place to look for help.
Create a GitHub account
Go to https://github.com/, and create an account (unless you already
have one).
Tips for creating a username from Happy Git with R.
Incorporate your actual name and use all lowercase (not required).
Pick a username you will be comfortable revealing to your
future boss. Shorter is better than longer.
Be as unique as possible in as few characters as possible.
Make it timeless. Don’t highlight your current university, employer,
or place of residence.
Avoid words laden with special meaning in programming, like NA.
Git, GitHub, and R Markdown live d e m o
Complete the following chapters from happygitwithr.com website:
• Register a free GitHub account (chapter 4)
• Install/update R and RStudio (chapter 5)
• Install Git (chapter 6)
• Introduce yourself to Git (chapter 7)
• (Optional) Install a Git client (Chapter 8)
• Confirm that you can push to / pull from GitHub from the command line
(chapter 9)
• Connect RStudio to Git and GitHub (Chapter 12)
• Test Drive R Markdown (Chapter 18)
• Render an R Script (Chapter 19)
• Check Assignment 0 (https://classroom.github.com/a/FribqZQb)
In the first Lab, you will go through the full version control cycle. As the
semester
Progresses, you will be more familiar with GitHub and you will be using
Git/GitHub in a team-based environment.
R M a r kd o w n
R Markdown
Generate fully reproducible reports - the analysis is run from the
beginning each time you knit
Simple Markdown syntax for text
Code goes in chunks, defined by three backticks, narrative goes
outside of chunks
Toward Reproducibility
Throughout the course, we will use Rmarkdown (or otherwise referred to
as R "notebooks") to write code and to document our analyses.
Rmarkdown embodies all the principles we discussed above. Allowing for
use to implement best practices in our research and empirical work.
For the "pitch" and valuable tutorials, see here
R Markdown
R Notebooks
R Notebooks
R Notebooks
R Notebooks
R Notebooks
Sample R Markdown syntax
Header Syntax Example
syntax **bold text** bold text
# Level one Le v e l
one *italicized text* italicized text
## Level two Level two - one on
- two e
- three tw
### Level Level o
three three thr
ee
#### Level Level four
`in-line code in-line code
four `
##### Level Level
five five
###### Level Level six
six
Compiling documents
To turn code into a report (.html,.pdf,.doc, ect) we need to knit (or "compile" the
document).
The YAML headers gives R the instructions for how to do this. Specifically,
we tell R what type of output we want.
.html
output: html_document
.docx
output: word_document
.pdf ... requires a LaTex distribution
output: pdf_document
Compiling documents
To turn code into a report (.html,.pdf,.doc, ect) we need to knit (or "compile"
the document).
The YAML headers gives R the instructions for how to do this.
Specifically, we tell R what type of output we want.
.nb.html
output: html_notebook
An R Markdown Notebook allows you to write code and then see the rendered
code in real time.
Compiling documents
We can knit a R Markdown document in one of three ways:
i. click the Knit button in RStudio
ii. Use the keyboard shortcut
Mac: command + shift + k
Windows: control + shift + k
ii. Knit the document using the
knit() function in the knitr
package.
Code Chunks
The real point of R Markdown is to embed your R code in your working
script so that the document is reproducible and transparent.
To write code, we need to create a code chunk. We can do this by:
click insert, and select an R code chunk.
press cmd + option/alt + i or ctrl + option/alt + i
This will yield a discolored chunk that looks like this. Everything written
in this chunk will be evaluated as R code. Everything written outside of it
will be evaluated as prose.
Code Chunks
Chunk output can be customized with options, arguments supplied to
chunk header. Knitr provides almost 60 options that you can use to
customize your code chunks.
Code Chunks
Option Run code Show code Output Plots Messages Warnings errors
eval = FALSE ✖ ✖ ✖ ✖ ✖ ✖
include = FALSE ✖ ✖ ✖ ✖ ✖ ✖
echo = FALSE ✖
results = "hide" ✖
fig.show = "hide" ✖
message = FALSE ✖
warning = FALSE ✖
error = FALSE ✖
YAML Header
Finally there are a bunch of different ways that the YAML can be set up.
Different configurations yield different layouts.
As we already saw, we can change how the document is compiled.
---
title: "Markdown Basics"
author: "Prof. Dunford"
date: "Fall 2019"
output: html_notebook
---
---
title: "Markdown Basics"
author: "Prof. Dunford"
date: "Fall 2019"
output: pdf_notebook
---
YAML Header
We can customize the YAML to include different output themes, table of
contents, parameters, and more!
---
title: "Markdown Basics"
author: "Prof. Dunford"
date: "Fall 2019"
output:
html_document:
theme: spacelab
highlight: espresso
toc: true
toc_depth: 2
---
Best practices
1. Code chunks should be broken up
2. No excessive output
i.e don't print of pages and pages of a data frame.
3.Figures should be appropriately sized for the
rendered document
4. All data and code should be self-contained
Given the data and the .Rmd file, the R Markdown down document
should knit.
Data science in practice
Take a look at what others have done in data science. Some of these use
R, others do not.
Analyzing trends in the Billboard Hot 100 over the past half
century Creating interactive redistricting maps
Tracking their life via Fitbit
Artificially composing Bach chorales
Detecting metastatic breast cancer from still images