Lecture 2: R-Blocks
Fundamentals
Ivan Belik
Assembled and built based on:
https://openlibra.com/en/book/download/an-introduction-to-r-2
1
R: Assignment and Reference
• Instead of re-typing code over and over again to run it
• we can give variable names to R objects and statements and recall them later
• Below we will cover how to:
• Create variables
• assign variable names
• refer to variables
2
R: Basics
• Basic arithmetic example:
>1+2
[1] 3
• Let’s assign and recall a variable name instead
• We can do that by using the assignment operator “<-”:
> Answer <- 1 + 2
>
• Limitations:
• Name cannot be a number
> 3 <- 1 + 2
Error in 3 <- 1 + 2 : invalid (do_set) left-hand side to assignment
>
• Variable name cannot start with a number (e.g. 3 <- 1 + 2 will not work, neither will 3Answer <- 1 + 2 )
> 3Answer <- 1 + 2
Error: unexpected symbol in "3Answer"
>
3
R: Basics
• Please, use descriptive names:
> Answer <- 3
>
• We can print assigned variables (or any results) using print() function:
• Frequently, print() is needed in the r-script when you have to print results/messages (as a part of your code)
> Answer <- 3
> print(Answer)
[1] 3
>
• But, of course, you can just type in the name of the object and R will do it’s thing (in console):
> Answer <- 3
> Answer
[1] 3
>
4
R: Basics
• In Rstudio: it is easy to track which objects were created:
5
R: Basics
• If you want to remove an object use rm() function
• Or you can use “Clear objects” button in RStudio to remove all objects
6
R: Basics
We have learned how to work with single values and statements
But what to do if we need to work with more complicated data objects,
for example, matrices or tables?
Now, let’s consider more interesting data objects that are very useful in practice
7
R: Objects
• R is an object oriented language
• R programming is based on the objects’ creation and manipulation.
• R’s objects come in different types and flavors.
8
R: Objects
• The most basic OBJECTS are:
1. Vectors:
- One-dimensional sequences of elements of the same mode (i.e., type)
- For example, this could be a vector of length 26 (i.e. one containing 26 elements)
where each element is a character
Examples:
Numeric vector: 5.0 5.4 5.8 6.2 6.6 7.0 7.4 7.8 8.2 8.6 9.0
Character vector: "apple" "red" "5" "TRUE"
9
R: Objects
2. Matrices & Arrays:
- Matrices: two dimensional rectangular objects (special type of vector with two dimensions: rows and columns)
- Arrays: higher - dimensional rectangular objects (similar to matrix, but can store data in more than two dimensions)
NOTE: All elements of matrices or arrays have to be of the same mode (i.e., type)
10
R: Objects
3. Lists:
- Lists are like vectors but they do not have to contain elements of the same mode
Example 1 (descriptive):
- The first element of a list could be a vector of the 26 letters of the alphabet.
- The second element could contain a vector of all the prime numbers below 1000.
- A third - could be a 2 by 7 matrix
Example 2 (R-list):
11
R: Objects
4. Data Frames:
- technically, data frame is a list of vectors of equal length.
- They are two dimensional containers with rows corresponding to ‘observations’ and columns corresponding to ‘variables.’
- We will go in more details later
Example:
5. Functions:
- Functions are objects that take other objects as inputs and return some new object.
Example:
12
R: Modes
• All objects have a certain mode (type of data they can contain)
• Some objects can only deal with one mode at a time (for ex., matrices)
•
• Others can store elements of multiple modes (for ex., lists)
• R distinguishes the following modes:
13
R: Vectors
Let’s start with vectors
(as they are the most basic data objects)
14
R: Vectors
• So far we have created only trivial vectors of length equal to 1:
> Answer <- 3
>
• Let’s assign some longer ones
• To do this we will use the c() function
• The “c” stands for concatenate
• you can string a bunch of elements together, separated by commas:
> Vector1 <- c(1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10)
> Vector1
[1] 1 2 3 4 5 6 7 8 9 10
>
15
R: Vectors
• Function c() also works for the vector of characters:
> Vector2 <- c( "a " , "b" , " c" , "d ")
> Vector2
[1] "a " "b" " c" "d "
>
• OR:
> Vector3 <- c( "1 " , "2" , " 3" , "4 ")
> Vector3
[1] "1 " "2" " 3" "4 "
>
• You can make a vector of vectors using c() function (to concatenate them):
> Vector4 <- c( Vector2 , Vector3 , Vector2 , Vector2 , Vector2 )
> Vector4
[1] "a " "b" " c" "d " "1 " "2" " 3" "4 " "a " "b" " c" "d " "a " "b" " c"
[16] "d " "a " "b" " c" "d "
>
16
R: Vector operations
• Most standard mathematical functions work with vectors:
> Vector1 <- c(1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10)
> Vector1 + Vector1
[1] 2 4 6 8 10 12 14 16 18 20
>
> Vector1 / Vector1
[1] 1 1 1 1 1 1 1 1 1 1
>
17
R: Vector operations
> log ( Vector1 )
[1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
[8] 2.0794415 2.1972246 2.3025851
>
• Here we are nesting the log() function inside the round() function:
> round ( log ( Vector1 ))
[1] 0 1 1 1 2 2 2 2 2 2
>
• The round() can take an optional argument digit
• argument digit specifies how many decimals to display (i.e. number of digits after point)
> round ( log ( Vector1 ) , digit = 3)
[1] 0.000 0.693 1.099 1.386 1.609 1.792 1.946 2.079 2.197 2.303
>
• If digit is not specified, It defaults to 0.
18
R: Vector operations
• For work with vectors a variety of built-in functions exist:
19
R: Vector operations
• Examples:
> Vector1 > sort ( Vector1 , decreasing = TRUE )
[1] 1 2 3 4 5 6 7 8 9 10 [1] 10 9 8 7 6 5 4 3 2 1
> sum ( Vector1 ) > length ( Vector1 )
[1] 55 [1] 10
> prod ( Vector1 )
[1] 3628800 > Vector5 <- c(1,3,5,7,10)
> which (Vector5 >=5) # note this returns the indices, not the elements
> median ( Vector1 ) [1] 3 4 5
[1] 5.5
Note: Indexing in R starts with 1
> sd( Vector1 )
[1] 3.02765
20
R: Simplifying Vector Creation
• Sometimes function c() is not useful
• For example, you do not want to type manually all elements of vector
• You can use the colon to create an integer vector:
> 1:10
[1] 1 2 3 4 5 6 7 8 9 10
• Assigning to an object:
> Vector5 <-1:10
> Vector5
[1] 1 2 3 4 5 6 7 8 9 10
21
R: Simplifying Vector Creation
• Also, you can use seq() function
• It is more general and has some neat features
> seq ( from = 0, to = 10)
[1] 0 1 2 3 4 5 6 7 8 9 10
> seq (0 , 10) # you can drop the argument names
[1] 0 1 2 3 4 5 6 7 8 9 10
> seq (0 , 10 , by = 2) # the 'by' argument let's you set the increments
[1] 0 2 4 6 8 10
22
R: Simplifying Vector Creation
> seq (0, 10, length.out = 25)
[1] 0.0000000 0.4166667 0.8333333 1.2500000 1.6666667 2.0833333 2.5000000
[8] 2.9166667 3.3333333 3.7500000 4.1666667 4.5833333 5.0000000 5.4166667
[15] 5.8333333 6.2500000 6.6666667 7.0833333 7.5000000 7.9166667 8.3333333
[22] 8.7500000 9.1666667 9.5833333 10.0000000
>
• the ' length .out ' argument specifies the length of the vector
• Here we have 25 elements
• R figures out the increments itself
23
R: Simplifying Vector Creation
• Function rep() allows you to repeat things:
> rep (0 , time = 10)
[1] 0 0 0 0 0 0 0 0 0 0
• You can drop the argument name (i.e., time):
> rep ( " Hello " , 3)
[1] " Hello " " Hello " " Hello "
24
R: Simplifying Vector Creation
• Repeating Vector 1 twice
> Vector1
[1] 1 2 3 4 5 6 7 8 9 10
> rep ( Vector1 , 2)
[1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
• We can repeat each element as well:
> Vector2
[1] "a" "b" "c" "d"
> rep(Vector2, each = 2)
[1] "a" "a" "b" "b" "c" "c" "d" "d"
>
25
R: Length
> Vector6 <- c( " The " , " Course" , " Fellow " , " is " , " smart ." )
• The length of Vector6:
> length ( Vector6 )
[1] 5
• Sometimes you do not want to print or manipulate an entire vector
• This is where indexing comes in
• To access vector’s indices we use [ ]
26
R: Indexing
• We reference the third element:
> Vector6 [3]
[1] " Fellow “
• We can reference a sequence of elements:
> Vector6 [2:4]
[1] " Course " " Fellow " " is "
> Vector6
• We can reference any elements we like: > [1] " The " " Course " " Fellow " " is " " smart ."
> Vector6 [c(1 ,3 ,4) ]
[1] " The " " Fellow " "is“
• We reference all except the 2nd element:
> Vector6 [ -2]
[1] " The " " Fellow " "is" “smart .“
27
R: Indexing
• To change elements:
> Vector6
> [1] " The " " Course " " Fellow " " is " " smart ."
> Vector6 [5] <- " great ."
> Vector6
[1] " The " " Course " " Fellow " "is" " great ."
28
R: Logical Operators
• Logical operators come in handy when indexing:
> Vector7 <- c(1 , 1, 2, 3, 4 , 4.5 , 6, 6, 10)
> Vector7
[1] 1.0 1.0 2.0 3.0 4.0 4.5 6.0 6.0 10.0
> Vector7 [ Vector7 == 1]
[1] 1 1
> Vector7 [ Vector7 >= 4]
[1] 4.0 4.5 6.0 6.0 10.0
> Vector7 [ Vector7 != sqrt (16) & Vector7 > 2]
[1] 3.0 4.5 6.0 6.0 10.0
29
R: More Functions
• Let’s consider the following three functions:
• na.omit()
• subset()
• sample()
• They are very useful for data processing
30
R: More Functions
• Let’s create a new vector called V:
> V <- c(2 , 3 , 4, 3, NA , NA , 6 , 6 , 10 , 11 , 2, NA , 4, 3)
>V
[1] 2 3 4 3 NA NA 6 6 10 11 2 NA 4 3
• In R, missing values are represented by the symbol NA (not available)
• Try to run max(V):
> max (V)
[1] NA
• This won't work because many functions (such as max) can't deal with NA-s
31
R: More Functions
• This is where the na.omit() function comes in.
• This function returns the vector suppressing the NA-s
• It adds an attribute to it called na.action
>V
[1] 2 3 4 3 NA NA 6 6 10 11 2 NA 4 3
> na.omit (V)
[1] 2 3 4 3 6 6 10 11 2 4 3
attr(,"na.action")
[1] 5 6 12 Indexes of the omitted elements
attr(,"class")
[1] "omit"
>
32
R: More Functions
• Now, we can apply all those functions that break when they encounter NA-s:
>V
[1] 2 3 4 3 NA NA 6 6 10 11 2 NA 4 3
> max (na.omit ( V ) )
[1] 11
• The summary() function is useful check whether NAs are present in your object:
> summary(V)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
2.000 3.000 4.000 4.909 6.000 11.000 3
33
R: More Functions
• The is.na() function is more powerful
• It checks each value: whether it is NA or not NA
• Combined with the subset() function we can remove the NAs manually
• This requires you to write a logical statement
• The first argument you need to supply is the object you want to subset (i.e., V in our case)
• The second should be a logical statement that R should evaluate.
>V
[1] 2 3 4 3 NA NA 6 6 10 11 2 NA 4 3
> is.na(V)
[1] FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
> V.noNA <- subset ( V , is.na( V ) == FALSE )
> V.noNA
[1] 2 3 4 3 6 6 10 11 2 4 3
>
34
R: More Functions
• The sample() function
• It takes the following arguments:
• size for the sample size
• replace = TRUE or replace = FALSE
> Vector <- 1:500
> sample ( Vector , size = 10 , replace = FALSE )
You will get different result since
[1] 287 350 497 196 105 273 63 42 224 245
sample is formed randomly
• Explanation:
• Imagine you have a bowl with 500 unique numbers (as we have in object Vector)
• size = 10 means that we pick up 10 numbers randomly in order to form our sample
• replace = TRUE: after we pick up a number we can place the same number back to the bawl
It means that it CAN be selected again
• replace = FALSE: after we pick up a number we remove it from the bawl
It means that it CANNOT be selected again
Consider a trivial example:
35
R: More Functions
> A <- c(1,2,3)
>A
[1] 1 2 3
> sample ( A , size = 3 , replace = TRUE )
[1] 2 1 2
> sample ( A , size = 3 , replace = FALSE )
[1] 1 2 3
36
R: More Functions
• The print() function
• We know already that the print() function prints an object to the screen
• It creates an object in the PC’s active memory
> print(0.2)
[1] 0.2
> X <- 0.2
> print(X)
[1] 0.2
>
• The paste() function
• a bit more useful
• you can paste multiple objects together and print them to the screen
> paste (X , " is equal to " , X)
[1] "0.2 is equal to 0.2"
37
R: Lists
Let’s continue with the next data object:
lists
38
R: LISTS
• List is a data structure that can contain elements of different modes (i.e., types).
• In fact, List is a special type of vector:
• Vector that has all elements of the same type is called VECTOR (i.e., atomic vector)
but:
• vector that contains elements of different types is called LIST
• To create list, please use list() function:
A <- list ("X"=3.5, "Y"=c(1,2,3), "Z"=FALSE)
• We created a list where we have three elements with the following types:
• Floating point number (like 3.35)
• Vector (c(1,2,3) - we know it already)
• Logical (it takes only one of two values: TRUE or FALSE)
39
R: LISTS
• In our list a we specified tags for each element: “X”, “Y” and “Z”
• It is useful to have tags to refer to the components of the lists:
40
R: LISTS
• We can also create lists without any tags:
Then, to retrieve the element (component of the list) to retrieve the content of the element
we use indexes in single square brackets [ ]: we use indexes in double square brackets [[ ]]:
41
R: LISTS
• We can use tags (if they were defined) instead of indexes to access the content of the element:
• This ways:
• Or, instead of double square brackets [[ ]], we can use $ sign to access the content of the element:
42
R: LISTS
• Few words about tags:
> A <- list ("X"=3.5, "Y"=c(1,2,3), "Z"=FALSE)
> A$"X"
[1] 3.5 Works!
> A["X"]
$X
[1] 3.5 Works!
> A <- list (X=3.5, Y=c(1,2,3), Z=FALSE)
> A$X
[1] 3.5 Works!
> A[X]
Error: object 'X' not found Does NOT work!
43
R: LISTS: Modifying
• To modify lists we should make a reassignment:
44
R: LISTS: adding
• To add a new component to the list we just need to declare a new component:
By declaring a new tag: Or just specifying the index:
45
R: LISTS: adding
• If we specify an index that is greater than the next expected index:
- Empty component(s) will be created before the new element
46
R: LISTS: adding
• Also we can use function append (x, values, after= ):
• x: is the list that we want to modify
• values: to be added to x
• after = : index of the element, after witch the values will be added
• Example: add new list into the existing list A
47
R: LISTS: deleting
To delete an element from the list We can use indexing:
we can just assign NULL value to it: negative index means "don't include this element".
48
Interactive Input
49
R: Interactive Input
• Use readline() function to take input from the user (interactive session):
• It returns a character vector
• You should apply an appropriate conversion if you need numbers-mode:
> A <- readline(prompt="Enter your age: ")
Enter your age: 20
>A
[1] "20"
> Age <- as.integer(A) # convert to integer
> Age
[1] 20
> Age <- as.numeric(A) # convert to float
> Age
[1] 20
50
R: Interactive Input
• We can convert Age back to string format:
> A <- readline(prompt="Enter your age: ")
Enter your age: 20
>A
[1] "20"
> Age <- as.numeric(A) # converts to float
> Age
[1] 20
> Age <- as.character(A)
> Age
[1] "20"
>
51