R Lang-Unit-01
R Lang-Unit-01
1
• R programming is used as a leading tool for machine learning, statistics, and
data analysis. Objects, functions, and packages can easily be created by R.
• It’s a platform-independent language. This means it can be applied to all
operating system.
• It’s an open-source free language. That means anyone can install it in any
organization without purchasing a license.
a. Statistical Features of R:
• Basic Statistics: The most common basic statistics terms are the mean, mode, and
median. These are all known as “Measures of Central Tendency.” So using the R
language we can measure central tendency very easily.
2
• Static graphics: R is rich with facilities for creating and developing interesting static
graphics. R contains functionality for many plot types including graphic maps,
mosaic plots, biplots, and the list goes on.
• Data analysis: It provides a large, coherent and integrated collection of tools for
data analysis.
b. Programming Features of R:
1.3 Programming in R:
1.4 Advantages of R:
• As R programming language is an open source. Thus, you can run R anywhere and
at any time.
3
1.5 Disadvantages of R:
• In the R programming language, the standard of some packages is less than perfect.
1.6 Applications of R:
• We use R for Data Science. It gives us a broad variety of libraries related to statistics.
It also provides the environment for statistical computing and design.
• R is the most prevalent language. So many data analysts and research programmers
use it. Hence, it is used as a fundamental tool for finance.
• Tech giants like Google, Facebook, bing, Twitter, Accenture, Wipro and many more
using R nowadays.
2. Comments
Comments can be used to explain R code, and to make it more readable. It can
also be used to prevent execution when testing alternative code. Starts with a #.
When executing code, R will ignore anything that starts with #. This example uses a
comment before a line of code:
Example
# This is a comment
"Hello World!"
Multiline Comments
Unlike other programming languages, such as Java, there are no syntax in R for
multiline comments. However, we can just insert a # for each line to create multiline
comments:
Example
# This is a comment
# written in
4
# more than just one line
"Hello World!"
3. Creating Variables in R
Variables are containers for storing data values.
R does not have a command for declaring a variable. A variable is created the
moment you first assign a value to it. To assign a value to a variable, use the <- sign. To
output (or print) the variable value, just type the variable name:
Example
name <- "John"
age <- 40
name # output "John" age
# output 40
You can also concatenate, or join, two or more elements, by using the
paste() function.
Example 1
Text<- "awesome"
paste("R is",
text) Example 2
text1<-"Ris"
text2<- "awesome"
paste(text1,
text2)
Example 3
num1<- 5
num2<- 10
num1 + num2
Example 4
If you try to combine a string (text) and a number, R will give you an error:
5
num<- 5 text<-
"Sometext"
num +
text
R allows you to assign the same value to multiple variables in one line:
Example
# Assign the same value to multiple variables in one line
#Print
var1 var2
var3
• A variable name must start with a letter and can be a combination of letters,
digits, period(.) and underscore(_). If it starts with period(.), it cannot be followed
by a digit.
• A variable name cannot start with a number or underscore (_)
• Variable names are case-sensitive (age, Age and AGE are three different variables)
• Reserved words cannot be used as variables (TRUE, FALSE, NULL, if...)
Example
Myvar<- "John"
my_var<- "John" myVar
<- "John" MYVAR<-
"John" myvar2<-
"John" .myvar<-
"John"
6
"John" my_v@ar<-
"John" TRUE <- "John"
4. Data Types
Variables can store data of different types, and different types can do different
things.
In R, variables do not need to be declared with any particular type, and can
even change type after they have been set:
Example
my_var <- 30 # my_var is type of numeric my_var <- "Sally" #
my_var is now of type character (aka string)
Basic data types in R can be divided into the following types:
Example
# numeric
x <- 10.5
class(x)
# integer
x <- 1000L
class(x)
# complex
x <- 9i +
3 class(x)
# character/string
x <- "R is
exciting" class(x)
# logical/boolean
x <- TRUE
class(x)
5. Numbers
7
There are three number types in R:
a) Numeric
b) Integer
c) Complex
Variables of number types are created when you assign a value to them:
Example
x <- 10.5 # numeric y
<- 10L # integer z
<- 1i # complex
a. Numeric
A numeric data type is the most common type in R, and contains any number
with or without a decimal, like: 10.5, 55, 787:
Example
x <- 10.5 y
<- 55
b. Integer
Integers are numeric data without decimals. This is used when you are certain
that you will never create a variable that should contain decimals. To create an integer
variable, you must use the letter L after the integer value:
Example
x <- 1000L y
<- 55L
8
# Print the class name of x and y
class(x) class(y)
c. Complex
Example
x <- 3+5i y
<- 5i
** Type Conversion
You can convert from one type to another with the following functions:
• as.numeric()
• as.integer()
• as.complex()
Example
x <- 1L # integer y
<- 2 # numeric
6. Math Functions
9
In R, you can use operators to perform common mathematical operations on
numbers.
10 + 5
10 – 5
6.1 Built-in Math Functions
R also has many built-in math functions that allows you to perform
mathematical tasks on numbers.
1. For example, the min() and max() functions can be used to find the lowest or
highest number in a set:
sqrt(16)
abs(-4.7)
4. The ceiling() function rounds a number upwards to its nearest integer, and the
floor() function rounds a number downwards to its nearest integer, and
returns the result:
ceiling(1.4)
floor(1.4)
7. STRING Functions
10
• Assigning a string to a variable is done with the variable followed by the <-
operator and the string:
• However, note that R will add a "\n" at the end of each line break. This is
called an escape character, and the n character indicates a new line.
If you want the line breaks to be inserted at the same position as in the
code, use the cat() function:
cat(str)
8. Escape Characters
To insert characters that are illegal in a string, you must use an escape
character.
Example
str <- "We are the so-called "Vikings", from the north."
str
Error: unexpected symbol in "str <- "We are the so-called
"Vikings"
11
To fix this problem, use the escape character \":
The escape character allows you to use double quotes when you normally would not
be allowed: Example
str <- "We are the so-called \"Vikings\", from the north."
str cat(str)
Note that auto-printing the str variable will print the backslash in the output. You
can use the cat() function to print it without backslash.
Code Result
\\ Backslash
\n New Line
\r Carriage Return
\t Tab
\b Backspace
9. Booleans/Logical Values
You can evaluate any expression in R, and get one of two answers,
TRUE or FALSE.
When you compare two values, the expression is evaluated and R returns the logical
answer:
Example
a <- 10
b <- 9
a > b
12
a <- 200 b
<- 33 if (b
> a) {
print ("b is greater than a")
} else {
print("b is not greater than a")
}
10. Operators
In the example below, we use the + operator to add together two values:
10 + 5
1. Arithmetic operators
2. Assignment operators
3. Comparison operators
4. Logical operators
5. Miscellaneous operators
a. Arithmetic operators
13
b. Assignment operators
Example
my_var <- 3
my_var <<- 3
3 -> my_var
3 ->> my_var
my_var # print
my_var
Note: <<- is a global assigner. You will learn more about this in the Global Variable
chapter.It is also possible to turn the direction of the assignment operator. x <- 3 is
equal to 3 -> x
c. Comparison operators
14
d. Logical operators
e. Miscellaneous operators
11.Vectors
15
A vector is simply a list of items that are of the same type.
To combine the list of items into a vector, use the c() function and separate the items by a
comma.
In the example below, we create a vector variable called fruits, that combine strings:
Example
# Vector of strings
fruits <- c("banana", "apple", "orange")
Example
# Print numbers
numbers
[1] 1 2 3
Example
[1] 1 2 3 4 5 6 7 8 9 10
You can also create numerical values with decimals in a sequence, but note that if
the last element does not belong to the sequence, it is not used:
16
Example
To find out how many items a vector has, use the length() function:
Example
[1] 3
Example
17
You can access the vector items by referring to its index number inside brackets [].
The first item has index 1, the second item has index 2, and so on:
Example
fruits[1]
[1] "banana"
You can also access multiple elements by referring to different index positions with
the c() function:
Example
fruits[c(1, 3)]
You can also use negative index numbers to access all items except the ones
specified:
Example
fruits[c(-1)]
18
# Change "banana" to "pear"
fruits[1] <- "pear"
# Print fruits
fruits
Example
3) repeat_times
[1] 1 2 3 1 2 3 1 2 3
Example
repeat_indepent
[1] 1 1 1 1 1 2 2 3
One of the examples on top, showed you how to create a vector with
numerical values in a sequence with the : operator:
Example
Example
19
numbers <- seq(from = 0, to = 100, by = 20)
numbers
[1] 0 20 40 60 80 100
The seq() function has three parameters: from is where the sequence starts, to is
where the sequence stops, and by is the interval of the sequence.
20
12. Lists
A list in R can contain many different data types inside it.
A list is a collection of data which is ordered and changeable.
To create a list, use the list() function.
Example
1.List containing data with same data type.
# List of strings
thislist <- list ("apple", "banana", "cherry")
[[1]]
[1]"apple"
[[2]]
[1]"banana"
[[3]]
[1] "cherry"
thislist1
[[1]]
[1] 10.1
[[12]]
[1] "a"
21
[[3]]
[1] 100
[[4]]
[1] "#"
You can access the list items by referring to its index number, inside
brackets. The first item has index 1, the second item has index 2, and so
on:
Example
thislist <- list ("apple", "banana", "cherry")
thislist [1]
[[1]]
[1] "apple"
Example
thislist <- list("apple" ,"banana", "cherry" )
thislist
22
[[1]]
[1]"blackcurrant"
[[2]]
[1]"banana"
[[3]]
[1] "cherry"
length(thislist)
[1] 3
Example
[1] TRUE
To add an item to the end of the list, use the append() function:
12.5.1 Example
23
Add "orange" to the list:
[[1]]
[1]"apple"
[[2]]
[1]"banana"
[[3]]
[1]"cherry"
[[4]]
[1] "orange"
12.5.2 Example
[[1]]
[1]"apple"
[[2]]
[1]"banana"
[[3]]
[1]"orange"
[[4]]
[1] "cherry"
24
12.6 Remove List Items
You can also remove list items. The following example creates a new,
updated list without an "apple" item:
Example
Remove "apple" from the list:
newlist
[[1]]
[1]"banana"
[[2]]
[1] "cherry"
Example
(thislist) [2:5]
[[1]]
[1]"banana"
[[2]]
[1]"cherry" [[3]]
[1]"orange"
25
[[4]]
[1] "kiwi"
Note: The search will start at index 2 (included) and end at index 5 (included).
Example
for (x in thislist){
print(x)
}
[1]"apple"
[1]"banana"
[1] "cherry"
The most common way is to use the c () function, which combines two
elements together:
Example
#Join Two Lists
26
list1 <- list ("a", "b","c")
list3
[[1]]
[1] "a"
[[2]]
[1] "b" [[3]]
[1] "c"
[[4]]
[1] 1
[[5]]
[1] 2
[[6]]
[1] 3
13. R Matrices
A matrix is a two dimensional data set with columns and rows.
A matrix can be created with the matrix() function. Specify the nrow and
ncol parameters to get the amount of rows and columns:
Example
# Create a matrix
thismatrix
[,1] [,2]
27
[1,] 1 4
[2,] 2 5
[3,] 3 6
Example
thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)
thismatrix
[,1] [,2]
[1,] "apple" "cherry"
[2,] "banana" "orange"
Example
thismatrix[1,2]
[1] "cherry"
The whole row can be accessed if you specify a comma after the number in the bracket:
Example
thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)
thismatrix[2,]
28
[1] "banana" "orange"
The whole column can be accessed if you specify a comma before the number in the bracket:
Example
thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)
thismatrix[,2]
thismatrix[c(1,2),]
Example
thismatrix[, c(1,2)]
[,1] [,2]
29
[1,] "apple" "orange"
[2,] "banana" "grape"
[3,] "cherry" "pineapple"
Example
newmatrix
Note: The cells in the new column must be of the same length as the existing
matrix.
Example
newmatrix
30
[,1] [,2] [,3]
[1,] "apple" "orange" "pear"
[2,] "banana" "grape" "melon"
[3,] "cherry" "pineapple" "fig"
[4,] "strawberry" "blueberry" "raspberry"
Note: The cells in the new row must be of the same length as the existing
matrix.
Example
3, ncol =2)
thismatrix
Example
31
[1] TRUE
Example
dim(thismatrix)
[1] 2 2
Example
thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)
length(thismatrix)
[1] 4
Total cells in the matrix is the number of rows multiplied by number of columns.
32
Example
(columns in 1:ncol(thismatrix)) {
print(thismatrix[rows, columns])
[1]"apple"
[1]"cherry"
[1]"banana"
[1]"orange"
Example
# Combine matrices
# Adding it as a rows
33
Matrix_Combined
# Adding it as a columns
Matrix_Combined
[,1] [,2]
[1,] "apple" "cherry"
[2,] "banana" "grape"
[3,] "orange" "pineapple"
[4,] "mango" "watermelon"
[,1] [,2] [,3] [,4]
[1,] "apple" "cherry" "orange" "pineapple"
[2,] "banana" "grape" "mango" "watermelon"
14. Arrays
Compared to matrices, arrays can have more than two
dimensions.
We can use the array() function to create an array, and the dim
parameter to specify the dimensions:
Example
# An array with one dimension with values ranging from 1 to 24
thisarray
multiarray
34
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
23 24
, , 1
, , 2
Example Explained
In the example above we create an array with the values 1 to 24.
Example
thisarray <- c(1:24)
multiarray[2,3,2]
35
[1] 22
Example
thisarray <- c(1:24)
# Access all the items from the first row from matrix one
# Access all the items from the first column from matrix one
multiarray <- array(thisarray, dim = c(4, 3, 2)) multiarray[,c(1),1]
[1] 1 5 9
[1] 1 2 3 4
A comma (,) before c() means that we want to access the column.
A comma (,) after c() means that we want to access the row.
Example
2 %in% multiarray
36
[1] TRUE
Example
thisarray <- c(1:24)
dim(multiarray)
[1] 4 3 2
Example
thisarray <- c(1:24)
length(multiarray)
[1] 24
Example
thisarray <- c(1:24)
37
multiarray <- array(thisarray, dim = c(4, 3, 2))
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14
[1] 15
[1] 16
[1] 17
[1] 18
[1] 19
[1] 20
[1] 21
[1] 22
[1] 23
[1] 24
38
15. R Data Frames
Data Frames are data displayed in a format as a table.
Data Frames can have different types of data inside it. While the
first column can be character, the second and third can be
numeric or logical. However, each column should have the
same type of data.
Example
# Create a data frame
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Data_Frame
Output:-
1 Training Pulse Duration
2 1 Strength 100 60
3 2 Stamina 150 30
4 3 Other 120 45
Example
Data_Frame <- data.frame (
39
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Data_Frame summary(Data_Frame)
Output:-
Training Pulse Duration
1 Strength 100 60
2 Stamina 150 30
3 Other 120 45
Training Pulse Duration
Other :1 Min. :100.0 Min. :30.0
Stamina :1 1st Qu.:110.0 1st Qu.:37.5
Strength:1 Median :120.0 Median :45.0
Mean :123.3 Mean :45.0
3rd Qu.:135.0 3rd Qu.:52.5
Max. :150.0 Max. :60.0
You will learn more about the summary() function in the statistical part of the R
tutorial.
Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Data_Frame[1]
40
Data_Frame[["Training"]]
Data_Frame$Training
Output:-
Training
1 Strength
2 Stamina
3 Other
[1] Strength Stamina Other
Levels: Other Stamina Strength
[1] Strength Stamina Other
Levels: Other Stamina Strength
Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Output:-
41
Training Pulse Duration
1 Strength 100 60
2 Stamina 150 30
3 Other 120 45
4 Strength 110 110
Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
New_col_DF
Output:-
Training Pulse Duration Steps
1 Strength 100 60 1000
2 Stamina 150 30 6000
3 Other 120 45 2000
42
15.5 Remove Rows and Columns
Use the c() function to remove rows and columns in a Data
Frame:
Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
output:-
Pulse Duration
2 150 30
3 120 45
Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
43
)
dim(Data_Frame)
Output:-
[1] 3 3
You can also use the ncol() function to find the number of
columns and nrow() to find the number of rows:
Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
) ncol(Data_Frame)
nrow(Data_Frame)
Output:-
[1] 3
[1] 3
Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
44
Duration = c(60, 30, 45)
)
length(Data_Frame)
Output:-
[1] 3
Example
Data_Frame1 <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Output:-
Training Pulse Duration
1 Strength 100 60
2 Stamina 150 30
3 Other 120 45
4 Stamina 140 30
45
5 Stamina 150 30
6 Strength 160 20
Example
Data_Frame3 <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Output:-
Training Pulse Duration Steps Calories
1 Strength 100 60 3000 300
2 Stamina 150 30 6000 400
3 Other 120 45 2000 300
46
16. R Factors
Factors are used to categorize data. Examples of factors are:
• Demography: Male/Female
Example
# Create a factor music_genre <- factor(c("Jazz", "Rock", "Classic",
music_genre
[1] Jazz Rock Classic Classic Pop Jazz Rock Jazz
Levels: Classic Jazz Pop Rock
You can see from the example above that that the factor has four levels
(categories): Classic, Jazz, Pop and Rock.
Example
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz",
"Rock", "Jazz"))
levels(music_genre)
47
[1] "Classic" "Jazz" "Pop" "Rock"
You can also set the levels, by adding the levels argument inside the factor()
function:
Example
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz",
levels(music_genre)
Example
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz",
"Rock", "Jazz"))
length(music_genre)
[1] 8
48
Example
Access the third item:
music_genre[3]
[1]Classic
Example
Change the value of the third item:
music_genre[3]
[1]Pop
Note that you cannot change the value of a specific item if it is not already
specified in the factor. The following example will produce an error:
49
Example
Trying to change the value of the third item ("Classic") to an item that does not
exist/not predefined ("Opera"):
music_genre[3]
Warning message:
In `[<-.factor`(`*tmp*`, 3, value = "Opera") : invalid
factor level, NA generated
However, if you have already specified it inside the levels argument, it will
work:
Example
Change the value of the third item:
music_genre[3]
[1] Opera
Levels: Classic Jazz Pop Rock Opera
50