Guest lecture
on
STATISTICS with R PROGRAMMING for DATA SCIENCE
Dr.A.MANIMARAN B.E,M.E,Ph.D
PROFESSOR,
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SAVEETHA SCHOOL OF ENGINEERING, SIMATS, CHENNAI
In this Lecture
R and R Studio
How do
Set the working directory
Create an R file and save it
Execute an R file
Variable
Basic Data Types
Advance Data Structure
Function
classes
R
R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand
Open Source Programming Language
R is world’s most widely used statistics programming language and graphics
Statistical Software and Data Analysis Tool
Command Line Interface
Platforms,
Windows,
Line X
Macos
What is R studio?
Integrated Development Environment(IDE) for R
available -Open source and Commercial software
Edition-Desktop version and Server version
A first look of R studio
Basic Program(syntax)
Depart<-” Welcome AIDS ”
Print(Depart)/ cat(“depart”, Depart)
OutPut
AIDS
The values of the variables can be printed using print() or cat() function.
The cat() function combines multiple items into a continuous print output
Variable
EXAMPLE
A variable is a memory allocated
for the storage of specific data var1 = "hello"
print(var1)
R Variables Syntax
# using leftward operator
• Using equal to operators var2 < - "hello"
variable_name = value print(var2)
• using leftward operator
# using rightward operator
variable_name <- value
"hello" -> var3
• using rightward operator print(var3)
value -> variable_name
Kept in Mind
Allowed characters are Alphanumeric, “ _” “.”.
Always Start With alphabets.
No special characters like @,$ etc
No keywords
R DATA TYPES
Basic Data Types Values Examples
Numeric Set of all real numbers "numeric_value <- 3.14"
Integer Set of all integers, Z "integer_value <- 42L"
Logical TRUE and FALSE "logical_value <- TRUE"
"complex_value <- 1 +
Complex Set of complex numbers
2i"
“a”, “b”, “c”, …, “@”,
"character_value <-
Character “#”, “$”, …., “1”, “2”,
"Hello Geeks"
…etc
TYPE VERIFICATION
Syntax:
is.data_type()
# Logical
print(is.logical(TRUE))
# Integer
print(is.integer(3L))
# Numeric
print(is.numeric(10.5))
# Complex
print(is.complex(1+2i))
# Character
print(is.character("12-04-2020"))
print(is.integer("a"))
print(is.numeric(2+3i))
Convert The Data Type Of An Object To Another
Syntax
as.data_type(object)
# Complex
print(as.character(1+2i))
# Can't possible
print(as.numeric("12-04-2020"))
# Numeric
print(as.logical(10.5))
Advance Data Structure (Data Types)
A data structure is a particular way of organizing data in a computer so
that it can be used effectively
• Vectors
• Lists
• Dataframes
• Matrices
• Arrays
• Factors
Vectors
Vectors contain a sequence of homogeneous types of data.
Atomic Vector
Integer
Double
Logical
Character
Complex
Raw
Recursive Vector
list
The function c() :
x <- c(1, -1, 3.5, 2)
Print(x)
print(typeof(x))
Output: 1,-1,3.5,2
Lists
A list is a generic object consisting of an ordered collection of objects.
Lists are heterogeneous data structures
The function list()
empId = c(1, 2, 3, 4)
empName = c("Debi", "Sandeep", "Subham", "Shiba")
empList = list(empId, empName)
Print(empList)
Output:
1 2 3 4
"Debi" "Sandeep" "Subham" "Shiba"
Accessing Components
By name(all components of a list can be named)
empList=list("ID" = empId, "names"=empName)
print(empList$names)
By indices
To Access top level components, use double slicing operator “[[]]” or [], and for
lower /inner level components use “[]” along with “[[]]”,
Print(emplist[1])
Print(emplist[1][2])
Manipulating Lists
A List can be modified by accessing Components & replacing them
empList[[2]][5]="manimaran“
print(empList)
Concatenation of List:
li=c(list1,list2)
Matrices
A matrix is a rectangular arrangement of numbers in rows and
columns
function matrix()
matrix(data, nrow, ncol, byrow, dimnames)
Matrices
M <- matrix(c(3:14), nrow = 4, byrow = TRUE)
print(M)
# Elements are arranged sequentially by column.
N <- matrix(c(3:14), nrow = 4, byrow = FALSE)
print(N)
# Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")
P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames =
list(rownames, colnames))
print(P)
DATA FRAME
CREATE
Access rows and columns
Edit
Add new rows and columns
Dataframes
Dataframes are generic data objects of R which are used to store the
tabular data
Function data.frame()
Age = c(22, 25, 45)
Language = c("R", "Python", "Java")
Name = c("Amiya", "Raj", "Asish")
df = data.frame(Name, Language, Age)
print(df)
Arrays
Arrays are the R data objects which store the data in
more than two dimensions. Arrays are n-dimensional
data structures
function array()
A = array( c(1, 2, 3, 4, 5, 6, 7, 8), dim = c(2, 2, 2) )
print(A)
Factors
Factors in R Programming Language are data structures that
are implemented to categorize the data or represent categorical
data and store it on multiple levels.
.
FUNCTION
Block of code which runs only when it is called function.
It has some inputs called arguments, and an output called the return value.
Creating a Function in R
by using the command function()
)
TYPES OF FUNCTION IN R LANGUAGE
Built-in Function: User-defined Function
R language allow us to write our own
Built-in functions in R are pre-defined functions
function
Find sum of numbers 4 to 6. evenOdd = function(x)
print(sum(4:6)) {
if(x %% 2 == 0)
return("even")
# Find max of numbers 4 and 6. else
print(max(4:6)) return("odd")
}
# Find min of numbers 4 and 6. print(evenOdd(4))
print(min(4:6)) print(evenOdd(3))
Example 1 Example square
2 = function(x)
mean2 <- function(x) {
{ x^2
n <- length(x) }
sum(x)/n
} square(4)
mean2(1:10)
Recursive Function in R
Recursion is when the function calls itself.
This forms a loop, where every time the function is called, it calls itself again and
again and this technique is known as recursion.
rec_fac <- function(x){
if(x==0 || x==1)
{
return(1)
}
else
{
return(x*rec_fac(x-1))
}
}
Find the sum of squares of a given series of numbers.
Sum = 12+22+…+N2
sum_series <- function(vec){
if(length(vec)<=1)
{
return(vec^2)
}
else
{
return(vec[1]^2+sum_series(vec[-1]))
}
}
series <- c(1:10)
sum_series(series)
R – OBJECT ORIENTED PROGRAMMING
Class and Object
Class is the blueprint or a prototype
from which objects are made by
encapsulating data members and
functions.
An object is a data structure that
contains some methods that act upon
its attributes.
S3 class
S4 class
Reference class
S3 CLASS
A list that will contain all the class members
Then this list is passed to the class() method as an argument
Syntax:
variable_name<-list(attribute1,attribute2, attribute3….attributeN)
# List creation with its attributes name and roll no.
a <- list(name="Adam", Roll_No=15)
# Defining a class "Student"
class(a) <- "Student"
# Creation of object
a
S3 CLASS
a=list(name="manimaran",Rollno=101)
print.Student <- function(obj)
{
cat("name: " ,obj$name, "\n")
cat("Roll No: ", obj$Roll_No, "\n")
}
print(a)
S4 CLASS
S4 class has a predefined definition. It contains functions for defining methods
and generics
setClass()
Syntax:
setClass(“myclass”,slots=list(name=”character”,
Roll_No=”numeric”))
new() function is used to create an object of the S4 class
pass the class name as well as the value for the slots.
S4 CLASS
setClass("Student",slots=list(name="character",
Roll_No="numeric"))
a <- new("Student", name="Adam", Roll_No=20)
a
R Programming Structure
Loop statements Flow chart
for loop
Syntax:
for(value in vector)
{
statements .... ....
}
for (i in 1: 4)
{
print(i ^ 2)
}
Repeat loop-To
iterate over a block of code
multiple number of times.
It executes the same code again and
again until a break statement is found.
Syntax:
Repeat
{ commands
if(condition)
{
break
}
}
Example
[1] "Hello World“
result <- c("Hello World")
i <- 1 [1] "Hello World“
repeat {
print(result) [1] "Hello World“
i <- i + 1
if(i >5) { [1] "Hello World“
break
} [1] "Hello World"
}
R- While loop Syntax :
while (test_expression)
{
Statement
update_expression
}
# R program to illustrate while loop
result <- c("Hello World")
i <- 1
while (i < 6) {
print(result)
i=i+1
}
[
Next Statement
It discontinues a particular iteration and Output:
jumps to the next iteration
for (i in c(3, 6, 23, 19, 0, 21))
[1] 3
{ [1] 6
{
[1] 23
next [1] 19
}
print(i)
[1] 21
} [1] Outside loop
print('Outside Loop’)
Break statement
The break keyword is a jump no <- 1:10
statement that is used to terminate the for (val in no)
loop at a particular iteration. {
if (val == 5)
Syntax: {
if (test_expression) print(paste("Coming out from for loop Where i = ",
val))
{
break
Break
}
}
print(paste("Values are: ", val))
}
Decision Making in R Programming
if statement
if(condition is true)
{
execute this statement
}
a <- 76
b <- 67
if(a > b)
{ c <- a - b
print("condition a > b is TRUE")
print(paste("Difference between a, b is : ", c))
}
a <- 67
b <- 76
Syntax:
if-else statement if(a > b)
if(condition is true) {
{ c <- a - b
print("condition a > b is TRUE")
execute this statement
print(paste("Difference between a, b is : ", c))
} } else
else { {
execute this statement c <- a - b
print("condition a > b is FALSE")
}
print(paste("Difference between a, b is : ", c))
}
THANKS