KEMBAR78
Week 2 Reproducibility in Practice | PDF | Computing | Software
0% found this document useful (0 votes)
5 views25 pages

Week 2 Reproducibility in Practice

Uploaded by

ameera.attiah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views25 pages

Week 2 Reproducibility in Practice

Uploaded by

ameera.attiah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Week 2

Reproducibility in Practice
Git and GitHub
Git and GitHub tips
• Git is a version control system, similar to “Track Changes”
features from Microsoft Word.
• GitHub is the home for your Git-based projects on the internet (like
DropBox but much better).
• There are a lot of Git commands and very few people know them
all; most of the time you will use
git add
git commit
git push
git pull

All of these commands can be executed through RStudio's


Git tab.
Git and GitHub tips
We will be using git and interfacing with GitHub through RStudio

If you Google for help you might come across methods for
doing these
things in the command line -- skip that and move on to the next
resource unless you feel comfortable trying it out.

There is a great resource for working with git and R:


happygitwithr.com.

Some of the content in there is beyond the scope of this course,


but it's a good place to look for help.
Create a GitHub account
Go to https://github.com/, and create an account (unless you already
have one).

Tips for creating a username from Happy Git with R.

Incorporate your actual name and use all lowercase (not required).

Pick a username you will be comfortable revealing to your

future boss. Shorter is better than longer.

Be as unique as possible in as few characters as possible.

Make it timeless. Don’t highlight your current university, employer,


or place of residence.

Avoid words laden with special meaning in programming, like NA.


Git, GitHub, and R Markdown live d e m o

Complete the following chapters from happygitwithr.com website:

• Register a free GitHub account (chapter 4)


• Install/update R and RStudio (chapter 5)
• Install Git (chapter 6)
• Introduce yourself to Git (chapter 7)
• (Optional) Install a Git client (Chapter 8)
• Confirm that you can push to / pull from GitHub from the command line
(chapter 9)
• Connect RStudio to Git and GitHub (Chapter 12)
• Test Drive R Markdown (Chapter 18)
• Render an R Script (Chapter 19)
• Check Assignment 0 (https://classroom.github.com/a/FribqZQb)

In the first Lab, you will go through the full version control cycle. As the
semester
Progresses, you will be more familiar with GitHub and you will be using
Git/GitHub in a team-based environment.
R M a r kd o w n
R Markdown
Generate fully reproducible reports - the analysis is run from the
beginning each time you knit

Simple Markdown syntax for text

Code goes in chunks, defined by three backticks, narrative goes


outside of chunks
Toward Reproducibility

Throughout the course, we will use Rmarkdown (or otherwise referred to


as R "notebooks") to write code and to document our analyses.

Rmarkdown embodies all the principles we discussed above. Allowing for


use to implement best practices in our research and empirical work.

For the "pitch" and valuable tutorials, see here

R Markdown
R Notebooks
R Notebooks
R Notebooks
R Notebooks
R Notebooks
Sample R Markdown syntax
Header Syntax Example
syntax **bold text** bold text
# Level one Le v e l
one *italicized text* italicized text

## Level two Level two - one on


- two e
- three tw
### Level Level o
three three thr
ee
#### Level Level four
`in-line code in-line code
four `

##### Level Level


five five

###### Level Level six


six
Compiling documents

To turn code into a report (.html,.pdf,.doc, ect) we need to knit (or "compile" the
document).

The YAML headers gives R the instructions for how to do this. Specifically,
we tell R what type of output we want.

.html
output: html_document

.docx
output: word_document

.pdf ... requires a LaTex distribution


output: pdf_document
Compiling documents
To turn code into a report (.html,.pdf,.doc, ect) we need to knit (or "compile"
the document).

The YAML headers gives R the instructions for how to do this.


Specifically, we tell R what type of output we want.

.nb.html
output: html_notebook

An R Markdown Notebook allows you to write code and then see the rendered
code in real time.
Compiling documents
We can knit a R Markdown document in one of three ways:

i. click the Knit button in RStudio

ii. Use the keyboard shortcut

Mac: command + shift + k


Windows: control + shift + k

ii. Knit the document using the


knit() function in the knitr
package.
Code Chunks
The real point of R Markdown is to embed your R code in your working
script so that the document is reproducible and transparent.

To write code, we need to create a code chunk. We can do this by:


click insert, and select an R code chunk.
press cmd + option/alt + i or ctrl + option/alt + i

This will yield a discolored chunk that looks like this. Everything written
in this chunk will be evaluated as R code. Everything written outside of it
will be evaluated as prose.
Code Chunks

Chunk output can be customized with options, arguments supplied to


chunk header. Knitr provides almost 60 options that you can use to
customize your code chunks.
Code Chunks
Option Run code Show code Output Plots Messages Warnings errors
eval = FALSE ✖ ✖ ✖ ✖ ✖ ✖
include = FALSE ✖ ✖ ✖ ✖ ✖ ✖
echo = FALSE ✖
results = "hide" ✖
fig.show = "hide" ✖
message = FALSE ✖
warning = FALSE ✖
error = FALSE ✖
YAML Header
Finally there are a bunch of different ways that the YAML can be set up.
Different configurations yield different layouts.

As we already saw, we can change how the document is compiled.


---
title: "Markdown Basics"
author: "Prof. Dunford"
date: "Fall 2019"
output: html_notebook
---

---
title: "Markdown Basics"
author: "Prof. Dunford"
date: "Fall 2019"
output: pdf_notebook
---
YAML Header
We can customize the YAML to include different output themes, table of
contents, parameters, and more!

---
title: "Markdown Basics"
author: "Prof. Dunford"
date: "Fall 2019"
output:
html_document:
theme: spacelab
highlight: espresso
toc: true
toc_depth: 2
---
Best practices
1. Code chunks should be broken up
2. No excessive output
i.e don't print of pages and pages of a data frame.

3.Figures should be appropriately sized for the


rendered document
4. All data and code should be self-contained
Given the data and the .Rmd file, the R Markdown down document
should knit.
Data science in practice
Take a look at what others have done in data science. Some of these use
R, others do not.

Analyzing trends in the Billboard Hot 100 over the past half

century Creating interactive redistricting maps

Tracking their life via Fitbit

Artificially composing Bach chorales

Detecting metastatic breast cancer from still images

You might also like