Module-2: Introduction to R Programming
• Introduction to R,
• R installation,
• Data types and function,
• Variables in R,
• Scalars,
• Vectors,
• Matrices,
• List,
• Data frames,
• functions in R,
• Factors
Introduction to R
• R is a popular programming language used for statistical computing and graphical presentation.
• Its most common use is to analyze and visualize data.
• Developed in the early 1990s by Ross Ihaka and Robert Gentleman, it has since become a
primary tool for data analysis and visualization in various fields, including academia, industry,
and government.
Why Use R?
• It is a great language for data analysis, data visualization, data science and machine learning
• It provides many statistical techniques (such as statistical tests, classification, clustering and
data reduction)
• It is easy to draw graphs in R, like pie charts, histograms, box plot, scatter plot, etc.
• It works on different platforms (Windows, Mac, Linux)
• It is open-source and free
• It has many packages (libraries of functions) that can be used to solve different problems
Installing R on Windows OS
To install R on Windows OS:
• Go to the Comprehensive R Archive Network (CRAN) website
(https://cran.r-project.org/).
• Click on "Download R for Windows".
• Click on "install R for the first time" link to download the R
executable (.exe) file.
• Run the R executable file to start installation, and allow the app to
make changes to your device.
• Select the installation language.
Additional R interfaces
• Other than the R GUI, the other ways to interface with R include
RStudio Integrated Development Environment (RStudio IDE)
and Jupyter Notebook.
• To run R on RStudio, you first need to install R on your computer.
• RStudio provides an interactive and friendly graphical interface to R
that greatly improves users’ experience.
Installing RStudio Desktop
To install RStudio Desktop on your computer, do the following:
• Go to the RStudio website.
• Click on "DOWNLOAD" in the top-right corner.
• Click on "DOWNLOAD" under the "RStudio Open Source
License".
• Download RStudio Desktop recommended for your computer.
• Run the RStudio Executable file (.exe) for Windows OS or the Apple
Image Disk file (.dmg) for macOS X.
Print function
Variables in R
• Variables are containers for storing data values.
• R does not have a command for declaring a variable.
• A variable is created the moment you first assign a value to it. To
assign a value to a variable, use the <- or = sign.
• R allows you to assign the same value to multiple
variables in one line:
• var1=var2 =var3= "Orange"
Variable Names
• A variable name must start with a letter and can be a combination of
letters, digits, period(.) and underscore(_).
• If it starts with period(.), it cannot be followed by a digit.
• A variable name cannot start with a number or underscore (_)
• Variable names are case-sensitive (age, Age and AGE are three
different variables)
• Reserved words cannot be used as variables (TRUE, FALSE, NULL, if...)
# Legal variable # Illegal variable
names: names:
myvar <- "John" 2myvar <- "John"
my_var <- "John" my-var <- "John"
myVar <- "John" my var <- "John"
MYVAR <- "John" _my_var <- "John"
myvar2 <- "John" my_v@ar <- "John"
.myvar <- "John" TRUE <- "John"
Data types Example
x <- 10.5
• Data types are used to specify the kind of data class(x)
that can be stored in a variable. x <- 1000L
• Data Types in R are: class(x)
1. numeric – (3,6.7,121) x <- 9i + 3
2. Integer – (2L, 42L; where ‘L’ declares this as an class(x)
integer) x <- "R is exciting"
3. complex – (7 + 5i; where ‘i’ is imaginary number) class(x)
4. character – (“a”, “B”, “c is third”, “69”) x <- TRUE
5. logical – (‘True’) class(x)
[1] "numeric
[1] "integer
[1] "complex
[1]
"character"
Type conversion
R Operators
• Operators are used to perform operations on variables and values.
• Arithmetic operators
• Assignment operators
• Comparison operators
• Logical operators
Relational operators/ are used to compare values.
Example: Output:
5 == 5 [1] TRUE
5 <= 3 [1] FALSE
5 >= 2 [1] TRUE
5 < 3 [1] FALSE
Assignment Operators
• Assignment operators are used to assign values to variables:
Example:
a <- 3
b=5
20->c Output:
a [1] 3
b [1] 5
c [1] 20
Relational operators/ are used to compare between values.
Example:
x <- c(TRUE, FALSE)
y <- c(FALSE, TRUE)
a=50
b=6
c=10
Output:
!x [1] FALSE TRUE
x & y [1] FALSE FALSE
a>b && b>c [1] FALSE
a>b || b>c [1] TRUE
x | y [1] TRUE TRUE
Control statements
• Control statements are expressions used to control the execution and flow of the program
based on the conditions provided in the statements.
Conditional statements
• if condition
• if-else condition
Looping Statements
• for loop
• while loop
Jump Statements
• break statement
• next statement
Conditional statements
• This control structure checks the expression provided in parenthesis is true or not.
If true, the execution of the statements in braces {} continues.
if If Else Else If
Statement a <- 20
a <- 200 b <- 33
a <- 33 b <- 33
b <- 200 c<-5
if (a > b) if (a > b && a>c) {
if (b > a) {
{ print("a is greater")
print(“a is greater") }
print("b is greater") }
} else if (b >c) {
else { print("b is greater")
print(“b is greater") }
} else {
print("c is greater ")
}
Looping statements
While Loop
• With the while loop we can execute a set of statements as long as a
condition is TRUE
Syntax:
while ( condition )
{
statement
}
Looping statements
R program to check weather
given number is spy number no=1124
or not’ sum_digits= 0
• A spy number is a number where prod_digits= 1
the sum of its digits is equal to
the product of its digits. while(no > 0) {
• For example, 1124 is a spy digit= no %% 10 #
Get the last digit
number because: sum_digits=sum_digits + digit
• Sum of digits: 1 + 1 + 2 + 4 = 8 prod_digits=prod_digits * digit
• Product of digits: 1 × 1 × 2 × 4 = 8 no = no %/%10 # Remove
the last digit
}
if(sum_digits == prod_digits) {
print("Given number is a spy number.\n")
} else {
print("Given number is not a spy number.\n")
}
Looping statements
For Loop:
• A for loop is used for iterating over a sequence
• With the for loop we can execute a set of statements, once for each item
in a vector, array, list, etc..
Syntax:
for (value in sequence)
{
statement
}
Looping statements
Program-1:
fruits <- list("apple", "banana", "cherry")
for (x in fruits) {
print(x)
Program-2:
} dice <- c(1, 2, 3, 4, 5, 6)
for (x in dice) {
print(x)
Output: }
[1] 1
[1] "apple"
[1] 2
[1] "banana"
[1] 3
[1] "cherry"
[1] 4
[1] 5
[1] 6
Looping statements
Break Next
With the break statement, we can stop With the next statement, we can skip an
the loop even if the while condition is iteration without terminating the loop:
TRUE:
i <- 0
i <- 0 while (i < 6) {
while (i < 6) { i <- i + 1
i <- i + 1 if (i == 3) {
if (i == 3) { next
break }
} print(i)
print(i) }
} [1] 1
[1] 2
[1] 4
[1] 1 [1] 5
[1] 2 [1] 6
Data structures
• Vectors
• Lists
• Matrices
• Dataframes
• Factors
Vector
• A vector is a sequence of data elements of the same type.
• To combine the list of items to a vector, use the c() function and separate the items
by a comma.
Types:
• Numeric vectors: c(1, 2, 3)
• Character vectors: c("apple", "banana", "cherry")
• Logical vectors: c(TRUE, FALSE, TRUE)
Vector Length
# Find the length of the fruits vector
fruits <- c("banana", "apple", "orange")
length(fruits) [1] 3
Vector
Sort a Vector
• To sort items in a vector alphabetically or numerically, use the sort()
function
fruits <- c("banana", "apple", "orange", "mango",
"lemon")
numbers <- c(13, 3, 5, 7, 20, 2)
sort(fruits) # Sort a string
sort(numbers) # Sort numbers
[1] "apple" "banana" "lemon" "mango"
"orange"
[1] 2 3 5 7 13 20
Vector
Access Vectors:
fruits <- c("banana", "apple", "orange", "mango", "lemon")
fruits[2]
# Access first and third item (banana and orange)
fruits[c(1, 3)]
# Access all items except for the third item
fruits[-3]
[1] "apple"
[1] "banana" "orange"
[1] "banana" "apple" "mango" "lemon"
list
• In R, a list is a data structure that can hold elements of different types (e.g., numbers,
strings, vectors, even other lists).
• Lists are particularly useful when you want to group related but different kinds of data
together.
• To create a list, use the list() function
• You can access the list items by referring to its index number, inside brackets. The first
item has index 1, the second item has index 2, and so on:
• To find out how many items a list has, use the length() function:
• To find out if a specified item is present in a list, use the %in% operator:
list
thislist <- list("apple", "banana", "cherry")
# Print the list
thislist
# print item 2
thislist[2]
# print length
length(thislist)
#Check if Item Exists
"apple" %in% thislist
Add List Items
• To add an item to the end of the list, use the append() function
thislist <- list("apple", "banana", "cherry")
append(thislist, "orange")
To add an item to the right of a specified index, add
"after=index number" in the append() function
thislist <- list("apple", "banana", "cherry")
append(thislist, "orange", after = 2)
Join Two Lists
• The most common way is to use the c() function, which combines
two elements together:
Range of Indexes
• You can specify a range of indexes by specifying where to start and
where to end the range, by using the : operator:
list1 <- list("a", "b", "c")
list2 <- list(1, 2)
list3 <- c(list1,list2)
List3 # ("a", "b", "c”,1,2)
list3[2:4] #("b", "c”,1)
Loop Through a List
thislist <- list("apple", "banana", "cherry")
for (x in thislist) {
print(x)
}
Matrices
• A matrix is a two dimensional data set with columns and rows.
• A matrix can be created with the matrix() function.
• Specify the nrow and ncol parameters to get the amount of rows and columns:
Creating Matrix:
# Create a matrix
thismatrix <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2)
# Print the matrix
thismatrix
Access Matrix Items
• You can access the items by using [ ] brackets. The first number in the bracket specifies the
row-position, while the second number specifies the column-position.
• The whole row can be accessed if you specify a comma after the number in the bracket
• The whole column can be accessed if you specify a comma before the number in the
bracket:
thismatrix <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol =
2)
thismatrix
thismatrix[1, 2]
thismatrix[2,]
thismatrix[,2]
thismatrix[c(1,2),]
Add Rows and Columns
• Use the cbind() function to add additional columns in a Matrix:
• Use the rbind() function to add additional rows in a Matrix:
thismatrix <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2)
newmatrix <- cbind(thismatrix, c(7,8,9))
print("After Adding column")
Newmatrix
newmatrix <- rbind(newmatrix, c(10,11,12))
print("After Adding row")
newmatrix
Remove Rows and Columns & Check if
an Item Exists
• Use the c() function to remove rows and columns in a Matrix
• To find out if a specified item is present in a matrix, use the %in% operator
• Use the dim() function to find the number of rows and columns in a Matrix:
thismatrix <- matrix(c(1,2,3,4,5,6), nrow =3,ncol
=2)
thismatrix
dim(thismatrix)
thismatrix <- thismatrix[-c(1), -c(1)]
thismatrix
1 %in% thismatrix
Combine two Matrices
• Again, you can use the rbind() or cbind() function to combine two or more
matrices together:
Matrix1 <- matrix(c("apple", "banana", "cherry", "grape"), nrow = 2, ncol =
2)
Matrix2 <- matrix(c("orange", "mango","pineapple","watermelon"),nrow=2,ncol=
2)
# Adding it as a rows
Matrix_Combined <- rbind(Matrix1, Matrix2)
Matrix_Combined
# Adding it as a columns
Matrix_Combined <- cbind(Matrix1, Matrix2)
Matrix_Combined
Data Frame
• A data frame is a two-dimensional data structure which can store data in tabular
format.
• Data frames have rows and columns and each column can be a different vector.
Create a Data Frame in R
• In R, we use the data.frame() function to create a
Data Frame.
The syntax of the data.frame() function is
dataframe1 <- data.frame(
first_col = c(val1, val2, ...),
second_col = c(val1, val2, ...),
...
)
Data Frame
# Create a data frame
dataframe1 <- data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE)
)
print(dataframe1)
Access Data Frame Columns
• There are different ways to extract columns from a data frame. We can use [ ], [[ ]],
or $ to access specific column of a data frame in R.
# Create a data frame
dataframe1 <- data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE)
)
# pass index number inside [ ]
print(dataframe1[1])
# pass column name inside [[ ]]
print(dataframe1[["Name"]])
# use $ operator and column name
print(dataframe1$Name)
Summarize the Data
• Use the summary() function to summarize the data from a Data Frame.
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Data_Frame
summary(Data_Frame)
Add Rows
• Use the rbind() function to add new rows in a Data Frame:
Data_Frame <- data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE)
)
# Add a new row
New_DF <- rbind(Data_Frame, c("Dev", 8,
FALSE))
# Print the new DF
print(New_DF)
Add columns
• Use the cbind() function to add new columns in a Data Frame:
Data_Frame <- data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE)
)
# Add a new column
New_DF <- cbind(Data_Frame,
location=c("Tenali", "Amaravati", "Guntur" ))
# Print the new row
print(New_DF)
Remove Rows and Columns
• Use the c() function to remove rows and columns in a Data Frame
Data_Frame <- data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE)
)
Data_Frame_New <- Data_Frame[-c(1), -c(1)]
# Print the new data frame
Data_Frame_New
Amount of Rows and Columns
• Use the dim() function to find the amount of rows and columns in a Data Frame:
• You can also use the ncol() function to find the number of columns and nrow() to find
the number of rows:
Data_Frame <- data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE)
)
dim(Data_Frame)
ncol(Data_Frame)
nrow(Data_Frame)
Combine Data Frames
• In R, we use the rbind() and the cbind() function to combine two data frames
together.
Data_Frame1 <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Data_Frame2 <- data.frame (
Training = c("Stamina", "Stamina", "Strength"),
Pulse = c(140, 150, 160),
Duration = c(30, 30, 20)
)
New_Data_Frame <- rbind(Data_Frame1, Data_Frame2)
Combine Data Frames
Data_Frame3 <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Data_Frame4 <- data.frame (
Steps = c(3000, 6000, 2000),
Calories = c(300, 400, 300)
)
New_Data_Frame1 <- cbind(Data_Frame3, Data_Frame4)
New_Data_Frame1
Problem
Customer ID Age Type of Price
food
A1 20 Pizza 165
A2 22 Pizza 220
A3 36 Meals 155
A4 47 Soup 130
A5 13 Soup 90
A6 39 Chips 59
A7 48 Pizza 99
A8 16 Sweets 65
A9 34 Sweets 80
A10 45 Sweets 35
•Create the following Data frame using R in-build functions
1.Write a R command to extract even number rows from the above data frame
2.Write a R program to extract type of food column from the above data frame
3.Write a R command to add a new column quantity in the above data frame
df <- data.frame(
CustomerID = c("A1", "A2", "A3", "A4", "A5", "A6", "A7",
"A8", "A9", "A10"),
Age = c(20, 22, 36, 47, 13, 39, 48, 16, 34,
45),
TypeOfFood = c("Pizza", "Pizza", "Meals", "Soup", "Soup",
"Chips", "Pizza",
"Sweets", "Sweets", "Sweets"),
Price = c(165, 220, 155, 130, 90, 59, 99, 65,
80, 35)
)
# View the dataframe
print(df)
# extract even number rows
even_rows <- df[seq(2, nrow(df), by = 2), ]
print(even_rows)
# extract type of food column
type_of_food <- df["TypeOfFood"]
# add a new column quantity
df <- cbind(df,Quantity=c(2, 3, 1, 4, 2, 5, 3, 1, 4, 2))
# extract Odd number rows
even_rows <- df[seq(1, nrow(df), by = 2), ]
# extract Even number columns
even_columns <- df[, seq(2, ncol(df), by = 2)]
# extract Odd number columns
even_columns <- df[, seq(1, ncol(df), by = 2)]
Factors
• Factors are used to categorize data.
Examples of factors are:
•Demography: Male/Female
•Music: Rock, Pop, Classic, Jazz
•Training: Strength, Stamina
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic",
"Pop", "Jazz", "Rock", "Jazz"))
music_genre
length(music_genre)
music_genre[3]
Functions
• A function is a block of code which only runs when it is called.
• You can pass data, known as parameters, into a function.
• A function can return data as a result
• To create a function, use the function() keyword:
my_function <- function() {
print("Hello World!")
}
my_function()
Arguments
• Data can be passed into functions as arguments.
• Arguments are specified after the function name, inside the parentheses.
• You can add as many arguments as you want, just separate them with a comma.
my_function <- function(fname, lname) {
print(paste(fname, lname))
}
my_function("Devansh", "Chirra")
Default Parameter Value
• If we call the function without an argument, it uses the default value:
my_function <- function(country = "Norway") {
paste("I am from", country)
}
my_function("India")
my_function() # will get the default value, which is Norway
Return Values
• To let a function return a result, use the return() function
my_function <- function(x) {
return (5 * x)
}
print(my_function(3))
Finding factorial using functions
factorial_loop <- function(n) {
if(n == 0) return(1)
result <- 1
for(i in 1:n) {
result <- result * i
}
return(result)
}
# Example: Calculate the factorial of 5
result <- factorial_loop(5)
print(result)