KEMBAR78
R Programming ChatGPT | PDF | Data Type | R (Programming Language)
0% found this document useful (0 votes)
103 views106 pages

R Programming ChatGPT

Uploaded by

zehrasaba22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views106 pages

R Programming ChatGPT

Uploaded by

zehrasaba22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 106

Course Name: R-Programming

Course Code: PGICT20E204


Credits: DSE (4)

Unit 1:
Introduction to R Programming, R-Installation, R-IDE, Advantages
and Disadvantages. R-Packages, Basic syntax, Data Types, Variables,
Operators and Keywords, Decision making, Looping and Functions

Unit 2:
String: String manipulation, R-data structures, Vectors, Lists, Arrays,
Matrix, Data frame, Factors

Unit 3:
Data Reshaping, Object Oriented Programming, R-Debugging, Data
Interfaces: csv files, excel files, Binary, XML and JSON File Data
Visualization: Pie chart, Bar Chart, Boxplot, Histogram, Line graphs
and Scatter plot

Unit 4:
Statistics with R: Mean, Median and Mode, R-Regression, Linear
Regression, Logistic Regression, Normal distribution, Binomial
distribution, Classification, Time Series Analysis, Basic Data Analysis
with R
Unit 1: Introduction to R Programming
 Introduction to R Programming and its features
 Installing R and R-Studio
 R IDE and its components
 Advantages and Disadvantages of R Programming
 Introduction to R Packages
 Basic Syntax in R Programming
 Data Types and Variables in R Programming
 Operators and Keywords in R Programming
 Decision Making in R Programming
 Looping in R Programming
 Functions in R Programming

Unit 2: Strings and Data Structures in


R Programming
 String Manipulation in R Programming
 R Data Structures
 Vectors in R Programming
 Lists in R Programming
 Arrays in R Programming
 Matrix in R Programming
 Data Frames in R Programming
 Factors in R Programming
Unit 3: Data Reshaping, Object-
Oriented Programming, and Data
Visualization in R Programming
 Data Reshaping in R Programming
 Object-Oriented Programming in R Programming
 Debugging in R Programming
 Data Interfaces in R Programming (CSV, Excel, Binary, XML, and
JSON files)
 Data Visualization in R Programming (Pie Chart, Bar Chart,
Boxplot, Histogram, Line graphs, and Scatter plots)

Unit 4: Statistics with R Programming


 Mean, Median, and Mode in R Programming
 R Regression in R Programming
 Linear Regression in R Programming
 Logistic Regression in R Programming
 Normal Distribution in R Programming
 Binomial Distribution in R Programming
 Classification in R Programming
 Time Series Analysis in R Programming
 Basic Data Analysis with R Programming.
Unit 1: Introduction to R
Programming
R Programming is a popular programming language used for
statistical computing and graphics. It provides a wide variety of
statistical and graphical techniques, making it an ideal tool for data
analysis and visualization. This unit introduces students to the
fundamentals of R programming, including installation and
configuration of R, using R IDEs, and an overview of R packages.
Topics covered in this unit include:
 Overview of R programming language
 Installing R and R Studio IDE
 Advantages and disadvantages of R programming
 R packages and libraries
 Basic syntax, data types, variables, operators, and keywords in R
 Decision-making structures (if-else, switch-case)
 Loops (for, while, repeat) and functions

Unit 2: String, Data


Structures, and Factors in R
R provides various data structures to store and manipulate data. In
this unit, we will discuss strings, vectors, lists, arrays, matrices, data
frames, and factors.
Strings: In R, a string is a sequence of characters. A character is
represented by single quotes or double quotes. For example, 'hello'
and "world" are strings in R. We can concatenate strings using the
paste() function.
Vectors: A vector is an ordered collection of elements of the same
data type. We can create a vector in R using the c() function. For
example, the following code creates a vector of integers:
r
vector1 <- c(1, 2, 3, 4, 5)
Lists: A list is an ordered collection of elements of different data
types. We can create a list in R using the list() function. For example,
the following code creates a list with two elements, a string and a
numeric vector:
less
list1 <- list("hello", c(1, 2, 3))
Arrays: An array is a multi-dimensional collection of elements of the
same data type. We can create an array in R using the array()
function. For example, the following code creates a 2x3 array of
integers:
r
array1 <- array(c(1, 2, 3, 4, 5, 6), dim =
c(2, 3))
Matrices: A matrix is a special case of an array with two dimensions.
We can create a matrix in R using the matrix() function. For example,
the following code creates a 2x3 matrix of integers:
r
matrix1 <- matrix(c(1, 2, 3, 4, 5, 6), nrow
= 2, ncol = 3)
Data frames: A data frame is a two-dimensional table-like data
structure in R where each column can be of a different data type. We
can create a data frame in R using the data.frame() function. For
example, the following code creates a data frame with three columns,
one numeric and two character columns:
less
df1 <- data.frame(num = c(1, 2, 3), char1 =
c("hello", "world", "!"), char2 = c("foo",
"bar", "baz"))
Factors: A factor is a categorical variable in R. We can create a factor using the factor() function. For example, the
following code creates a factor with three levels, "low", "medium", and "high":

bash
factor1 <- factor(c("low", "high",
"medium", "low", "high", "high"))
levels(factor1) <- c("low", "medium",
"high")
In summary, R provides various data structures to store and
manipulate data. Understanding these data structures is essential for
working with data in R.
Unit 3: Data Reshaping, Object
Oriented Programming, R-Debugging,
Data Interfaces, and Data
Visualization
1. Data Reshaping:
o Reshaping data with melt() and cast() functions

o Transforming data with the reshape() function

o Combining data with merge() and rbind() functions

2. Object-Oriented Programming:
o Creating classes and objects in R

o Inheritance and polymorphism in R

o S3 and S4 object systems in R

3. R-Debugging:
o Debugging with browser() function

o Debugging with traceback() function

o Debugging with debug() function

4. Data Interfaces:
o Reading and writing data from and to CSV files

o Reading and writing data from and to Excel files

o Reading and writing data from and to binary files

o Reading and writing data from and to XML and JSON files

5. Data Visualization:
o Creating pie charts with pie() function

o Creating bar charts with barplot() function

o Creating boxplots with boxplot() function


o Creating histograms with hist() function
o Creating line graphs with plot() function
o Creating scatter plots with plot() function

Unit 4: Statistics with R


R is a powerful tool for statistical analysis and modeling. In this unit,
we will explore various statistical techniques that can be performed
with R. We will cover the following topics:
1. Mean, Median, and Mode: In this topic, we will learn how to
calculate the mean, median, and mode of a dataset using R.
2. Regression Analysis: Regression analysis is a statistical method
for examining the relationship between two or more variables. In
this topic, we will learn about linear regression, logistic
regression, and how to perform regression analysis in R.
3. Probability Distributions: Probability distributions are
mathematical functions that describe the likelihood of an event.
In this topic, we will learn about the normal distribution,
binomial distribution, and other probability distributions, and
how to work with them in R.
4. Classification: Classification is a supervised learning technique
used to assign a label to a new observation based on training
data. In this topic, we will learn about different classification
techniques, such as decision trees, k-nearest neighbors, and
support vector machines, and how to implement them in R.
5. Time Series Analysis: Time series analysis is a statistical
technique used to analyze time-dependent data. In this topic, we
will learn about the basic concepts of time series analysis and
how to work with time series data in R.
6. Basic Data Analysis: In this topic, we will learn how to perform
basic data analysis in R, such as data cleaning, transformation,
and visualization.
Overall, this unit will provide a solid foundation for statistical analysis
and modeling with R.
UNIT 1.
Introduction to R Programming,
R programming language is an open-source statistical computing and
graphics software widely used by data analysts, statisticians,
researchers, and scientists. It was developed in the early 1990s by
Ross Ihaka and Robert Gentleman at the University of Auckland, New
Zealand. R is a programming language that provides a wide variety of
statistical and graphical techniques such as linear and nonlinear
modeling, time-series analysis, classification, and clustering.
One of the main advantages of R is that it has a large and active
community of users who contribute to the development of new
packages and functions. R also has a large number of built-in
functions and libraries that make it easy to perform complex
statistical analyses and create high-quality visualizations.
R can be installed on different operating systems such as Windows,
macOS, and Linux. It can be used either as a command-line tool or
with a variety of integrated development environments (IDEs) such as
RStudio, which provide a more user-friendly interface for working
with R.
Overall, R is a powerful tool for data analysis and visualization that is
widely used in academia, industry, and research.
R-Installation
R is an open-source programming language that can be easily
installed on various platforms like Windows, Mac, and Linux. Here are
the steps to install R on Windows:
1. Go to the R homepage (https://www.r-project.org/) and click on
the "Download R" link.
2. Choose your preferred CRAN mirror (a server that provides R
packages and tools) and click on the corresponding link.
3. Click on the "Download R for Windows" link.
4. Click on the "install R for the first time" link.
5. Click on the "Download R" link next to "base" to download the
latest version of R.
6. Run the downloaded file and follow the installation wizard.
7. Once the installation is complete, open R by clicking on the R icon
on your desktop.
Note: The installation process may vary slightly depending on the
version of R and the operating system you are using.

R-IDE, Advantages and Disadvantages


R-IDE (Integrated Development Environment) is an application that
provides an environment to write, edit, and execute R code. RStudio
is one of the most popular R-IDE used by the data scientists and
developers.
Advantages of R-IDE:
1. User-friendly interface: R-IDE provides a user-friendly interface
that helps data scientists to write, test, and execute code in an
efficient and organized way.
2. Code completion and error highlighting: R-IDE provides code
completion features that help developers to complete the code
in a faster way. It also highlights errors and suggests solutions.
3. Integrated debugging tools: R-IDE provides debugging tools that
help developers to identify and fix errors in the code.
4. Collaboration: R-IDE supports collaboration by allowing multiple
developers to work on the same code simultaneously.
Disadvantages of R-IDE:
1. Limited features: R-IDE has limited features as compared to
other IDEs like Visual Studio, which is used for developing
complex applications.
2. Resource-intensive: R-IDE requires a lot of resources like CPU,
memory, and storage space. This can slow down the
performance of the system.
3. Steep learning curve: R-IDE has a steep learning curve for
beginners who are not familiar with R programming language.
4. Limited support: R-IDE has limited support for debugging, error
handling, and version control.

R Advantages and Disadvantages


R is a popular programming language that is widely used for statistical
analysis, data visualization, and machine learning. Here are some
advantages and disadvantages of using R:
Advantages:
1. Free and open-source: R is free to use and distribute, and its
source code is available for modification and improvement.
2. Powerful statistical analysis: R is designed specifically for
statistical analysis, and it includes a wide range of statistical and
graphical techniques.
3. Large and active community: R has a large and active community
of users, developers, and contributors who provide support,
documentation, and extensions.
4. Cross-platform compatibility: R runs on all major operating
systems, including Windows, Mac, and Linux.
5. Easy integration: R can easily integrate with other programming
languages, databases, and software tools.
Disadvantages:
1. Steep learning curve: R can be difficult to learn for users with
little or no programming experience, as it requires knowledge of
programming concepts and statistical methods.
2. Limited scalability: R may not be the best choice for large-scale
data analysis or complex computations, as it may be slower than
other programming languages.
3. Memory management: R's memory management system can be
complex and require careful attention to avoid memory leaks and
other errors.
4. Lack of user interface: R does not have a user interface, which
may be a disadvantage for users who prefer a graphical interface
for programming.
5. Data security: R does not have built-in security features, which
may be a concern for organizations that handle sensitive data.

R-Packages
R Packages are collections of functions, data sets, and documentation
that can be easily installed and loaded into R. These packages are
used to extend the functionality of R and to perform specialized tasks.
Some advantages of R packages include:
1. Easy to use: R packages make it easy to perform complex data
analysis tasks with just a few lines of code.
2. Community-driven: The R package system is community-driven,
meaning that anyone can create and share packages.
3. Reusability: R packages can be reused across different projects,
making it easier to standardize data analysis processes.
4. Variety: There are thousands of R packages available, covering a
wide range of topics such as data visualization, statistical
modeling, machine learning, and more.
5. Open-source: R packages are typically open-source, which means
that they are free to use and modify.
Some disadvantages of R packages include:
1. Versioning: Different packages may require different versions of
R, which can cause versioning issues and compatibility problems.
2. Quality: The quality of packages can vary, and not all packages
may be suitable for a particular task.
3. Learning curve: Some packages may have a steep learning curve,
which can make it difficult for new users to get started.
4. Maintenance: The maintenance of packages can be an issue,
especially if the package author stops updating it or if the
package becomes outdated.
5. Documentation: Not all packages have thorough documentation,
which can make it difficult for users to understand how to use
them.

Basic syntax
In R programming, basic syntax refers to the rules and conventions
used in writing code to communicate with the R interpreter. Some
examples of basic syntax in R include:
1. Assignment operator: In R, the assignment operator is <- or =. It
is used to assign values to variables.
2. Comments: Comments are used to add notes or explanations to
your code. In R, a comment starts with the # symbol.
3. Functions: Functions are a set of instructions that perform a
specific task. In R, functions are called by their name followed by
parentheses.
4. Data structures: Data structures are used to store and organize
data in R. Some common data structures in R include vectors,
matrices, and data frames.
5. Control flow statements: Control flow statements are used to
control the flow of code execution based on certain conditions.
Examples of control flow statements in R include if-else
statements and loops.
6. Operators: Operators are symbols or words used to perform
mathematical or logical operations in R. Examples of operators in
R include +, -, *, / for arithmetic operations and <, >, == for logical
operations.
These are just a few examples of basic syntax in R, but mastering
them is essential to writing effective R code.

Data Types
In R, there are several data types that can be used to store and
manipulate data. Some of the common data types in R are:
1. Numeric: This data type is used to store numeric values such as
integers, decimal numbers, and so on. Numeric values can be
either positive or negative.
2. Character: This data type is used to store text or strings. A string
is a sequence of characters enclosed within quotes.
3. Logical: This data type is used to store logical values which can be
either TRUE or FALSE.
4. Factor: This data type is used to store categorical data such as
gender, occupation, and so on. Factors are stored as integers
with corresponding levels.
5. Integer: This data type is used to store whole numbers.
6. Complex: This data type is used to store complex numbers.
7. Raw: This data type is used to store raw bytes of data.
8. Date: This data type is used to store dates.
9. POSIXct: This data type is used to store date and time values.
10. POSIXlt: This data type is used to store date and time values
as a list.
In addition to these basic data types, there are also various data
structures in R that can be used to store and manipulate data. Some
of the common data structures in R are vectors, matrices, arrays, lists,
and data frames.

Variables
In R, a variable is a name that represents a value or an object. It is
used to store and manipulate data. Variables are assigned using the
assignment operator <- or =.
For example, if we want to assign the value 10 to a variable called x,
we can write:
r
x <- 10
In this case, we have assigned the value 10 to the variable x.
Variables in R can have different data types, such as numeric,
character, logical, complex, etc. The data type of a variable is
automatically determined based on the value assigned to it.
For example, if we assign a character string to a variable, the variable
will have a character data type:
python
name <- "John"
Similarly, if we assign a logical value to a variable, the variable will
have a logical data type:
r
is_true <- TRUE
We can also check the data type of a variable using the class() function:
R
class(x)
This will return the class of the variable x, which is "numeric" in this
case.

Operators and Keywords


In R programming, operators and keywords are used to perform
different operations on the data. Here are some commonly used
operators and keywords in R:
1. Arithmetic operators: These operators are used to perform
arithmetic operations like addition, subtraction, multiplication,
division, and modulus on the numerical values.

Operator Description
+ Addition
- Subtraction
* Multiplication
/ Division
^ Exponentiation
Operator Description
%% Modulus
2. Logical operators: These operators are used to perform logical
operations on the data, and the result is either TRUE or FALSE.
Operator Description
== Equal to
!= Not equal to
> Greater than
>= Greater than or equal to
< Less than
<= Less than or equal to

3. Assignment operators: These operators are used to assign values


to variables.
Operator Description
<- Leftward assignment
-> Rightward assignment
= Assignment
4. Other operators: There are other operators in R that are used to
perform different operations, such as:
Operator Description
%in% Membership test
%*% Matrix multiplication
%/% Integer division
Operator Description
%o% Outer product
%x% Kronecker product
%<>% Pipe operator
R also has a list of keywords that are reserved for specific functions
and cannot be used as variable names. These include "if", "else",
"while", "repeat", "for", "function", "next", "break", "return", and
"switch".

Decision making
Decision making is an important aspect of programming, and R
provides various constructs to implement decision making in code.
The following are the different constructs used for decision making in
R:

1. If-else statement: The if-else statement is used to execute a


set of code if a certain condition is met, and another set of code
if the condition is not met. The general syntax of the if-else
statement is as follows:
sql
if (condition) {
# code to execute if the condition is
true
} else {
# code to execute if the condition is
false
}
2. Switch statement: The switch statement is used to execute a
set of code based on the value of an expression. The general
syntax of the switch statement is as follows:
R
switch(expression,
value1 = {
# code to execute if expression
equals value1
},
value2 = {
# code to execute if expression
equals value2
},
...
default = {
# code to execute if expression
does not match any of the values
})
3. Ifelse function: The ifelse function is a vectorized version of
the if-else statement. It is used to apply a certain condition to
each element of a vector or data frame, and then execute a
certain set of code based on the result of the condition. The
general syntax of the ifelse function is as follows:
R
ifelse(condition,
true_value,
false_value)
4. Ternary operator: The ternary operator is a shorthand
version of the if-else statement, and is used to assign a value to a
variable based on a condition. The general syntax of the ternary
operator is as follows:
r
variable <- (condition) ? true_value :
false_value
These constructs can be used to implement decision making in R and
make programs more flexible and adaptable.

Looping and Functions


Looping:
Looping is a control structure that allows a programmer to execute a
statement or a set of statements multiple times. Loops are useful
when you need to execute a block of code repeatedly, based on a
certain condition. R programming language supports different types
of loops:

1. for loop: A for loop is used to iterate over a sequence of values.


The general syntax of a for loop in R is:
R
for (variable in sequence) {
statements
}
2. while loop: A while loop is used to execute a block of code
repeatedly as long as a certain condition is true. The general
syntax of a while loop in R is:
R
while (condition) {
statements
}
3. repeat loop: A repeat loop is used to execute a block of code
repeatedly until a certain condition is met. The general syntax of
a repeat loop in R is:
R
repeat {
statements
if (condition) {
break
}
}
Functions:
Functions are a set of instructions that can be called multiple times in
a program. R programming language allows users to create their own
functions. A function in R can be defined using the function() keyword,
followed by the function name, arguments, and the body of the
function. The general syntax of a function in R is:
R
function_name <- function(arg1, arg2, ...)
{
statements
return(value)
}
where function_name is the name of the function, arg1, arg2, ... are
the function arguments, statements are the set of instructions that
the function performs, and value is the value returned by the
function.
For example, the following code defines a function in R that takes two
arguments and returns their sum:
R
addition <- function(x, y) {
result <- x + y
return(result)
}
The function can then be called multiple times in the program using
its name and the appropriate arguments:
R
sum1 <- addition(10, 5)
sum2 <- addition(20, 30)
In this example, sum1 will be equal to 15 and sum2 will be equal to
50.

UNIT 2.
String: String manipulation
In R programming, strings are a sequence of characters enclosed
within double-quotes or single-quotes. R provides many built-in
functions for string manipulation. Here are some of the commonly
used functions for string manipulation in R:
1. nchar() - returns the number of characters in a string.
2. tolower() - converts a string to lowercase.
3. toupper() - converts a string to uppercase.
4. paste() - combines two or more strings into a single string.
5. substr() - extracts a substring from a string.
6. gsub() - replaces a pattern in a string with another string.
7. strsplit() - splits a string into substrings based on a delimiter.
8. grep() - searches for a pattern in a string and returns the index of
the first match.
Let's see some examples:
R
# Define a string
string <- "Hello World"

# Number of characters in the string


nchar(string)
# Output: 11

# Convert to lowercase
tolower(string)
# Output: "hello world"

# Convert to uppercase
toupper(string)
# Output: "HELLO WORLD"

# Concatenate two strings


paste("Hello", "World", sep = " ")
# Output: "Hello World"

# Extract a substring
substr(string, start = 2, stop = 6)
# Output: "ello "

# Replace a pattern in a string


gsub("l", "L", string)
# Output: "HeLLo WorLd"

# Split a string into substrings


strsplit(string, " ")
# Output: list("Hello", "World")

# Search for a pattern in a string


grep("l", string)
# Output: 3 4 10
These are just a few examples of the many functions available for
string manipulation in R.

R-data structures,
R provides several built-in data structures that are used for storing
and manipulating data. These data structures are important to know
for effective data analysis in R. Some of the commonly used R data
structures are:
1. Vectors: Vectors are used to store a sequence of elements of the
same data type. They can be created using the c() function.
Vectors can be of different types such as logical, numeric,
character, and complex.
2. Lists: Lists are used to store a collection of objects, which can be
of different types. They can be created using the list() function.
3. Matrices: Matrices are used to store a collection of elements of
the same data type arranged in a 2-dimensional rectangular
layout. They can be created using the matrix() function.
4. Arrays: Arrays are used to store a collection of elements of the
same data type arranged in a multi-dimensional rectangular
layout. They can be created using the array() function.
5. Data frames: Data frames are used to store a collection of
variables of different types. They are similar to matrices but can
have different types of data in each column. They can be created
using the data.frame() function.
6. Factors: Factors are used to represent categorical data in R. They
can be created using the factor() function.
These data structures are very useful for performing various data
analysis tasks in R.

Vectors
In R programming, a vector is a one-dimensional array-like object that
can store homogeneous data elements of any type, such as numeric,
character, or logical. A vector can be created using the c() function,
which stands for "combine".
For example, to create a vector of numeric values, we can use:
r
numbers <- c(1, 2, 3, 4, 5)
To create a vector of character values, we can use:
python
names <- c("John", "Mary", "Tom", "Sarah")
To access elements of a vector, we use the square brackets notation [
]. For example, to access the third element of the numbers vector
created above, we can use:
python
numbers[3] # returns 3
We can also perform operations on vectors, such as adding or
subtracting them element-wise. For example:
r
v1 <- c(1, 2, 3)
v2 <- c(4, 5, 6)
v3 <- v1 + v2 # returns a new vector [5, 7,
9]
In addition to regular vectors, R also has specialized types of vectors,
such as:
 Logical vectors, which can only contain TRUE or FALSE values
 Integer vectors, which can only contain whole numbers
 Complex vectors, which can store complex numbers with real
and imaginary parts
 Raw vectors, which can store raw bytes
To create a vector of a specific type, we can use functions such as
logical(), integer(), complex(), and raw(). For example:
r
log_vec <- logical(3) # creates a logical
vector with 3 elements
int_vec <- integer(5) # creates an integer
vector with 5 elements
Lists
In R, a list is a collection of objects of different types such as vectors,
matrices, and other lists. It is a versatile data structure that can hold
different types of objects in a single entity. Elements of a list can be
accessed using an index or the name of the element.
Here is an example of how to create a list in R:
R
# create a list with different types of
elements
my_list <- list(name="John Doe", age=35,
married=TRUE, children=c("Mary", "Tom"))

# access elements of the list using index


or name
my_list[[1]] # returns "John Doe"
my_list$age # returns 35
my_list[4] # returns a sublist with
children element
In this example, we create a list my_list with four elements of
different types: a character string "John Doe", a numeric value 35, a
logical value TRUE, and a character vector c("Mary", "Tom"). We can
access elements of the list using double brackets [[ ]], the dollar sign
$, or square brackets [ ].
Arrays
In R, an array is a data structure that can store values of the same
data type. It can have multiple dimensions, which can be specified
while creating the array. Arrays are useful when we need to work
with data that is arranged in a tabular format and has multiple
dimensions.
To create an array in R, we can use the array() function. Here's the
basic syntax:
r
array(data, dim = c(dim1, dim2, dim3, ...))
 data: a vector of data to be arranged in the array.
 dim: a vector specifying the dimensions of the array.
Here's an example of creating a 2-dimensional array in R:
R
# Create a 2-dimensional array
my_array <- array(c(1, 2, 3, 4, 5, 6), dim
= c(2, 3))

# Print the array


my_array
Output:
css
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
In this example, we created a 2-dimensional array with dimensions
2x3 and stored it in the variable my_array. We specified the data to
be arranged in the array using the c() function, and the dimensions of
the array using the dim argument.

Matrix
In R, a matrix is a two-dimensional array with the same type of data
elements. It is created using the matrix() function. The syntax for
creating a matrix is as follows:
R
matrix(data, nrow, ncol, byrow)
where:
 data is the input data to be converted to a matrix
 nrow is the number of rows in the matrix
 ncol is the number of columns in the matrix
 byrow is a logical value indicating whether the matrix should be
filled by row (TRUE) or by column (FALSE)
Here is an example of creating a matrix:
r
# create a matrix with 2 rows and 3 columns
matrix(c(1,2,3,4,5,6), nrow=2, ncol=3)
Output:
css
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
We can access the elements of a matrix using the row and column
indices. For example:
r
# create a matrix with 2 rows and 3 columns
m <- matrix(c(1,2,3,4,5,6), nrow=2, ncol=3)

# access the element in row 1, column 2


m[1,2]
Output:
csharp
[1] 3
We can also perform operations on matrices, such as addition,
subtraction, and multiplication, as long as the matrices have
compatible dimensions.

Data frame
A data frame is a two-dimensional data structure in R that stores data
in a tabular form. It is similar to a matrix, but the columns of a data
frame can contain different data types, such as numeric, character, or
factor. A data frame can also have row names and column names,
and each column in a data frame is a vector. Data frames are
commonly used for storing and analyzing data sets.
In R, you can create a data frame using the data.frame() function. For
example, to create a data frame with three columns named ID, Name,
and Age, you can use the following code:
less
df <- data.frame(ID = c(1, 2, 3), Name =
c("John", "Jane", "Bob"), Age = c(25, 30,
35))
This will create a data frame df with three columns and three rows,
where the first column contains the IDs, the second column contains
the names, and the third column contains the ages of the individuals.
You can access the data in a data frame using the $ operator, for
example:
bash
df$Name
This will return a vector with the names in the data frame. You can
also use indexing to access specific rows and columns in a data frame,
for example:
bash
df[1, "Name"]
This will return the name in the first row of the data frame.
Factors
In R, a factor is a type of data object that is used to categorize or
group data. Factors are created by taking a vector of values and
defining them as belonging to one or more categories. The categories
themselves are represented as levels.
Factors are useful for a variety of tasks, including data analysis and
visualization. For example, if you have data on the gender of
participants in a study, you can create a factor with two levels, "male"
and "female", and assign each participant to one of these levels.
Factors are especially useful when working with categorical data, such
as survey responses or demographic information. They can also be
used to represent ordinal or nominal data, such as Likert scales or
education levels.
Some common functions used for working with factors in R include
factor(), levels(), summary(), and table().

In R, a factor is a data object used for categorizing and storing data


values as levels. Factors are used to represent categorical variables in
statistical modeling and data analysis.
For example, if we have a dataset of students and their grades in a
subject, we can create a factor for the grade column with levels "A",
"B", "C", "D", and "F". This will make it easier to analyze the data and
create summary statistics for each grade level.
We can create a factor in R using the factor() function. Here is an
example:
bash
# Create a vector of grade values
grades <- c("A", "B", "C", "A", "B", "F",
"D", "C", "B", "A")

# Create a factor for the grades with


levels in order
grades_factor <- factor(grades,
levels=c("A", "B", "C", "D", "F"))

# Print the factor


grades_factor
Output:
less
[1] A B C A B F D C B A
Levels: A B C D F
In this example, we created a factor for the grades vector with levels
in the order "A", "B", "C", "D", and "F". The factor() function
automatically assigns the levels based on the order they are given in
the levels argument. We can see that the factor is created correctly
by printing it, and it shows the original grades vector with the levels
"A", "B", "C", "D", and "F".

UNIT 3.
Data Reshaping,
Data Reshaping in R involves transforming the structure of data from
one format to another. It involves converting data from long to wide
format or wide to long format. This is useful for better analysis and
visualization of data.
In R, the reshape2 package provides functions for data reshaping. The
main functions used for data reshaping are:
1. melt(): This function is used to convert data from a wide format
to a long format.
2. dcast(): This function is used to convert data from a long format
to a wide format.
The reshape() function can also be used for data reshaping, but it is
not as efficient as the melt() and dcast() functions.
Data reshaping is useful in scenarios where you have data in one
format and need to transform it to another format that is more
suitable for analysis or visualization. For example, you may have
survey data where the responses are in a wide format with each
column representing a different question. To perform analysis, it may
be more suitable to have the data in a long format with each row
representing a response to a question.

Object Oriented Programming


Object-oriented programming (OOP) is a programming paradigm that
focuses on the use of objects that contain data and methods. In OOP,
objects are instances of classes, which define their properties and
behaviors. OOP provides a way to organize code into reusable
modules and to model complex systems using simple abstractions.
In R, OOP is implemented through the use of S3, S4, and R6 classes.
S3 classes are the simplest form of classes in R, and are used for basic
object-oriented programming tasks. S4 classes are more advanced
and can be used to model more complex systems. R6 classes are a
newer addition to R and provide a more intuitive way to define
objects with properties and methods.
Overall, OOP in R allows for the creation of more modular and
maintainable code, which can be especially useful when working with
large, complex systems. However, it can also add additional
complexity and overhead to code, and may not be necessary for all
programming tasks.
R-Debugging,
Debugging is a process of finding and resolving errors in the code. R
provides several tools to help with debugging, including:
1. Traceback: When an error occurs, R prints a traceback, which
shows the call stack leading up to the error. This can help you
identify where the error occurred in your code.
2. Debugging functions: R provides several functions for debugging,
including browser(), which allows you to step through your code
one line at a time, and debug(), which sets a breakpoint in your
code.
3. Debugging packages: There are several packages in R that
provide additional debugging tools, such as the debugme
package.
4. Profiling: Profiling is a technique for identifying performance
bottlenecks in your code. R provides several profiling tools,
including the profvis package.
Overall, debugging in R involves using a combination of tools to
identify and resolve errors in your code.

Data Interfaces: csv files, excel files,


Binary, XML and JSON File
In R, there are several ways to interface with external data files,
including:
1. CSV files: CSV (Comma Separated Values) files are a common
way of storing tabular data, with each row of data separated by a
new line and each column separated by a comma.
2. Excel files: R can read and write Excel files using the "readxl" and
"openxlsx" packages.
3. Binary files: R can read and write binary files using functions like
"readBin" and "writeBin".
4. XML files: XML (eXtensible Markup Language) is a format used
for storing and exchanging data on the web. R can read and write
XML files using the "XML" package.
5. JSON files: JSON (JavaScript Object Notation) is a lightweight data
interchange format that is easy for humans to read and write and
easy for machines to parse and generate. R can read and write
JSON files using the "jsonlite" package.
In general, R can interface with many other types of data files as well,
including databases and web APIs.

Data Interface functions in R:


R provides several packages and functions to read and write data in
various file formats. Some of the commonly used functions are:
1. CSV files: read.csv() and write.csv()
2. Excel files: readxl::read_excel() and writexl::write_xlsx()
3. Binary files: readBin() and writeBin()
4. XML files: XML::xmlParse() and XML::writeXML()
5. JSON files: jsonlite::fromJSON() and jsonlite::toJSON()
These functions can be used to read data from a file and store it in R
objects like data frames, matrices, or lists, or to write data from R
objects to a file. The specific function to use will depend on the file
format and the specific requirements of the data being read or
written.

Data Visualization: Pie chart, Bar


Chart, Boxplot, Histogram, Line
graphs and Scatter plot
Data visualization is an important aspect of data analysis, and R
provides various packages and functions to create different types of
visualizations. Some of the commonly used types of visualizations in R
are:
1. Pie chart: A pie chart is a circular chart that is divided into slices
to illustrate numerical proportions.
2. Bar chart: A bar chart is a chart that represents categorical data
with rectangular bars with heights or lengths proportional to the
values they represent.
3. Boxplot: A boxplot is a graph that represents the distribution of a
continuous variable through its quartiles, with outliers plotted as
points.
4. Histogram: A histogram is a graph that represents the
distribution of a continuous variable through its frequency
distribution, with the data being divided into intervals or bins.
5. Line graph: A line graph is a graph that represents the
relationship between two continuous variables, with the values
being plotted as points and joined by lines.
6. Scatter plot: A scatter plot is a graph that represents the
relationship between two continuous variables, with the values
being plotted as points and no lines joining them.
R provides various packages to create these visualizations, such as
ggplot2, plotly, lattice, and base graphics. These packages provide
functions that take input data and plot it into the desired
visualization.

Data Visualization Functions in R:


R has several built-in functions for data visualization. Here are some
commonly used functions for creating pie charts, bar charts, boxplots,
histograms, line graphs, and scatter plots:
1. Pie chart: The pie() function can be used to create a pie chart.
Here's an example:
R
x <- c(30, 20, 50)
labels <- c("A", "B", "C")
pie(x, labels = labels)
2. Bar chart: The barplot() function can be used to create a bar
chart. Here's an example:
r
x <- c(30, 20, 50)
names <- c("A", "B", "C")
barplot(x, names = names)
3. Boxplot: The boxplot() function can be used to create a boxplot.
Here's an example:
R
x <- c(10, 20, 30, 40, 50)
boxplot(x)
4. Histogram: The hist() function can be used to create a histogram.
Here's an example:
R
x <- rnorm(100)
hist(x)
5. Line graph: The plot() function can be used to create a line graph.
Here's an example:
R
x <- c(1, 2, 3, 4, 5)
y <- c(10, 20, 30, 40, 50)
plot(x, y, type = "l")
6. Scatter plot: The plot() function can also be used to create a
scatter plot. Here's an example:
R
x <- rnorm(100)
y <- rnorm(100)
plot(x, y)

UNIT 4.
Statistics with R: Mean,
Median and Mode
R provides functions for calculating mean, median, and mode of a
given set of data.
1. Mean: The mean function in R is mean(). It calculates the
arithmetic mean of a set of values. Example:
scss
data <- c(10, 20, 30, 40, 50)
mean(data)
Output:
csharp
[1] 30
2. Median: The median function in R is median(). It calculates
the median of a set of values.
Example:
scss
data <- c(10, 20, 30, 40, 50)
median(data)
Output:
csharp
[1] 30
3. Mode: R does not have a built-in function for calculating mode,
but it can be calculated using other functions. One way to
calculate the mode is by using the table() function to create a
frequency table and then selecting the value with the highest
frequency. Example:
r
data <- c(10, 20, 20, 30, 40, 40, 40,
50)
freq_table <- table(data)
mode <-
as.numeric(names(freq_table)[freq_table
== max(freq_table)])
mode
Output:
csharp
[1] 40

R-Regression
In R, regression analysis is performed using the lm() function, which
stands for linear model. This function takes a formula as its first
argument, which specifies the dependent variable and the
independent variables to be used in the model.
For example, the following code fits a linear regression model with y
as the dependent variable and x1 and x2 as the independent
variables:
scss
model <- lm(y ~ x1 + x2, data = mydata)
The data argument specifies the data frame containing the variables
used in the model.
After fitting the model, various methods can be used to extract
information about the model, such as summary(model) which
provides a summary of the model including coefficients, standard
errors, t-statistics, and p-values.
There are also other regression models available in R, such as logistic
regression (glm()), Poisson regression (glm() with family = "poisson"),
and generalized linear models (glm() with family = "binomial" or
"Gamma", for example).

R- Regression Functions and


Algorithms Explained
R provides a wide range of regression algorithms and functions for
statistical analysis, including:
1. Linear Regression: It is used to model the relationship between a
dependent variable and one or more independent variables by
fitting a straight line to the data. In R, the "lm()" function is used
for linear regression.
2. Logistic Regression: It is used when the dependent variable is
binary (0 or 1). It models the relationship between the
dependent variable and one or more independent variables using
the logistic function. In R, the "glm()" function is used for logistic
regression.
3. Poisson Regression: It is used when the dependent variable is a
count variable. It models the relationship between the
dependent variable and one or more independent variables using
the Poisson distribution. In R, the "glm()" function is used for
Poisson regression.
4. Ridge Regression: It is used when there is multicollinearity in the
data. It shrinks the coefficients of the independent variables to
reduce their variance. In R, the "glmnet()" function is used for
Ridge regression.
5. Lasso Regression: It is used when there are a large number of
independent variables, and we want to select the most important
variables. It shrinks the coefficients of the less important
variables to zero. In R, the "glmnet()" function is used for Lasso
regression.
6. Elastic Net Regression: It is a combination of Ridge and Lasso
regression. It balances between the strengths of Ridge and Lasso
regression. In R, the "glmnet()" function is used for Elastic Net
regression.
7. Quantile Regression: It is used when the relationship between
the dependent variable and independent variables is not linear. It
models the relationship at different quantiles of the dependent
variable. In R, the "quantreg()" function is used for quantile
regression.
These are some of the most commonly used regression algorithms
and functions in R. There are also many other regression algorithms
and functions available in R depending on the specific needs of the
analysis.

Linear Regression
Linear regression is a statistical method used to model the
relationship between a dependent variable and one or more
independent variables. The goal of linear regression is to find the
best-fit line that represents the relationship between the variables. In
R, linear regression can be performed using the lm() function.
Here's an example of how to perform linear regression in R:
R
# Load the 'mtcars' dataset
data(mtcars)

# Fit a linear regression model to the


data
model <- lm(mpg ~ wt, data = mtcars)

# Print the model summary


summary(model)
In this example, we're fitting a linear regression model to the mtcars
dataset, using the mpg variable as the dependent variable and the wt
variable as the independent variable. The lm() function returns an
object of class lm, which we've assigned to the variable model. We
can then use the summary() function to print out a summary of the
model, including the coefficients, standard errors, t-values, and p-
values.
The output of the summary() function might look something like this:
yaml
Call:
lm(formula = mpg ~ wt, data = mtcars)

Residuals:
Min 1Q Median 3Q Max
-4.5432 -2.3647 -0.1252 1.4096 6.8727

Coefficients:
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
wt -5.3445 0.5591 -9.559 1.29e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.046 on 30 degrees of freedom


Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10

Call:
lm(formula = mpg ~ wt, data = mtcars)

Residuals:
Min 1Q Median 3Q Max
-4.5432 -2.3647 -0.1252 1.4096 6.8727

Coefficients:
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
wt -5.3445 0.5591 -9.559 1.29e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.046 on 30 degrees of freedom


Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
The output shows the coefficients of the regression equation (in this
case, mpg = 37.2851 - 5.3445 * wt), as well as the standard errors, t-
values, and p-values for each coefficient. It also shows the residual
standard error, which is an estimate of the standard deviation of the
errors in the model, and the R-squared value, which measures the
proportion of variance in the dependent variable that is explained by
the independent variable.

Logistic Regression
Logistic regression is a statistical method used for predicting binary
outcomes, that is, outcomes that can only take two possible values. It
is a form of regression analysis that is widely used in machine
learning, statistics, and other fields.
In logistic regression, a logistic function is used to model the
probability of a certain outcome based on one or more predictor
variables. The logistic function is an S-shaped curve that maps any
input value to a value between 0 and 1. The logistic regression model
estimates the coefficients of the predictor variables to find the best
fit line that separates the two classes.
The logistic regression model is widely used in classification problems
such as spam detection, fraud detection, and medical diagnosis. It is a
powerful tool for predicting binary outcomes and can be used in both
small and large data sets.
In R, logistic regression can be performed using the glm() function.
The glm() function fits a generalized linear model to the data, and in
the case of logistic regression, the family argument should be set to
"binomial".
For example, the following code fits a logistic regression model to the
"diabetes" data set in R:
scss
library(datasets)
data(diabetes)
model <- glm(diabetes ~ glucose + age +
bmi, data = diabetes, family =
"binomial")
summary(model)
In this example, the predictor variables are "glucose", "age", and
"bmi", and the outcome variable is "diabetes". The glm() function is
used to fit a logistic regression model to the data, and the family
argument is set to "binomial". The summary() function is then used to
display the results of the model.

Normal distribution,
Normal distribution, also known as Gaussian distribution, is a
continuous probability distribution that is widely used in statistics to
model random variables that have a symmetrical distribution around
the mean value. The distribution is characterized by its mean (μ) and
standard deviation (σ), and the probability density function (PDF) of a
normal distribution is given by:
f(x) = (1/√(2π)σ) * exp(-((x-μ)/σ)^2 / 2)
where x is a random variable, μ is the mean, σ is the standard
deviation, π is the mathematical constant pi, and exp is the
exponential function.
The normal distribution has some important properties, such as:
 It is a bell-shaped curve that is symmetric around its mean.
 About 68% of the data falls within one standard deviation of the
mean, about 95% of the data falls within two standard deviations
of the mean, and about 99.7% of the data falls within three
standard deviations of the mean.
 Many natural phenomena follow a normal distribution, such as
heights, weights, IQ scores, and errors in measurements.
In R, you can generate random numbers from a normal distribution
using the rnorm() function, calculate the probability density function
using the dnorm() function, and the cumulative distribution function
using the pnorm() function.

Binomial distribution,
In probability theory and statistics, the binomial distribution is a
discrete probability distribution that describes the number of
successes in a fixed number of independent Bernoulli trials, where
each trial has the same probability of success. The binomial
distribution is often used in hypothesis testing and statistical
inference.
In R, the dbinom(), pbinom(), qbinom(), and rbinom() functions are
used for computing and working with the binomial distribution. Here
is a brief description of these functions:
 dbinom(x, size, prob) computes the probability mass function
(PMF) of the binomial distribution for a given value of x (number
of successes), size (number of trials), and prob (probability of
success).
 pbinom(q, size, prob) computes the cumulative distribution
function (CDF) of the binomial distribution for a given value of q
(number of successes), size (number of trials), and prob
(probability of success).
 qbinom(p, size, prob) computes the quantile function of the
binomial distribution for a given probability p, size (number of
trials), and prob (probability of success).
 rbinom(n, size, prob) generates random samples from the
binomial distribution for a given n (sample size), size (number of
trials), and prob (probability of success).
Here's an example of using these functions in R:
Suppose we want to find the probability of getting exactly 3 heads in
5 tosses of a fair coin. We can use the dbinom() function as follows:
scss
dbinom(3, 5, 0.5)
Output:
csharp
[1] 0.3125
This means the probability of getting exactly 3 heads in 5 tosses of a
fair coin is 0.3125.
We can also generate a random sample of size 10 from a binomial
distribution with 10 trials and a probability of success of 0.3 using the
rbinom() function as follows:
scss
rbinom(10, 10, 0.3)
Output:
csharp
[1] 2 5 2 5 5 5 5 5 1 5
This generates a vector of length 10 containing random samples from
a binomial distribution with 10 trials and a probability of success of
0.3.

Classification
Classification is a machine learning technique in which an algorithm is
trained to predict the class or category of a given input based on a set
of features or attributes. In R, there are several packages and
functions that can be used for classification tasks, including:
1. caret: The caret package provides a unified interface for many
different classification algorithms, such as k-nearest neighbors,
decision trees, random forests, and support vector machines. It
also includes functions for data preprocessing, model tuning, and
performance evaluation.
2. randomForest: The randomForest package implements the
random forest algorithm, which is an ensemble method that
combines multiple decision trees to improve accuracy and reduce
overfitting. It can handle both classification and regression tasks.
3. glm: The glm function in base R can be used for logistic
regression, which is a common classification algorithm for binary
outcomes. It models the log-odds of the outcome as a linear
function of the input variables.
4. nnet: The nnet package provides functions for neural network
models, which are another type of machine learning algorithm
commonly used for classification tasks. They are particularly well-
suited for complex nonlinear relationships between inputs and
outputs.
5. knn: The knn package provides functions for k-nearest neighbors
classification, which is a simple and intuitive algorithm that
assigns a new input to the class that is most common among its k
nearest neighbors in the training data.
These are just a few examples of the many classification algorithms
and packages available in R. The choice of algorithm will depend on
the specific problem and the characteristics of the data. It is often a
good idea to try multiple algorithms and compare their performance
to choose the best one.

Time Series Analysis


Time series analysis is a statistical method used to analyze and extract
insights from time-series data. It involves studying data over time and
identifying patterns or trends in the data. R has many built-in
functions and packages that can be used to perform time series
analysis.
Some of the commonly used functions and packages for time series
analysis in R are:
1. ts() - This function is used to create a time series object in R.
2. forecast package - This package provides functions for
forecasting time series data. Some of the functions available in
this package are auto.arima() for automatically selecting an
ARIMA model, ets() for fitting exponential smoothing models,
and tbats() for fitting complex seasonal models.
3. tseries package - This package provides functions for analyzing
and modeling time series data. Some of the functions available in
this package are acf() and pacf() for analyzing the autocorrelation
and partial autocorrelation of a time series, and arima() for fitting
an ARIMA model.
4. xts package - This package provides functions for working with
time series data in a matrix-like format. Some of the functions
available in this package are period.apply() for applying a
function to non-overlapping time periods, and rollapply() for
applying a function to a moving window of time periods.
5. zoo package - This package provides functions for working with
irregularly spaced time series data. Some of the functions
available in this package are na.approx() for filling in missing
values with interpolated values, and rollmean() for calculating a
rolling mean of the data.
These functions and packages can be used to perform a variety of
tasks in time series analysis, such as data visualization, trend analysis,
forecasting, and model fitting.

Basic Data Analysis with R


Basic Data Analysis with R typically involves importing, cleaning,
transforming, and summarizing data to extract useful insights and
knowledge. Here are some common steps involved in the data
analysis process using R:
1. Importing data: Data can be imported into R from various
sources such as CSV, Excel, databases, and APIs. R has built-in
functions such as read.csv(), read_excel(), readr::read_csv(), etc.
to read data from these sources.
2. Data Cleaning: The data may contain missing values, duplicate
records, outliers, etc. which can affect the analysis. Cleaning
involves identifying and handling these issues using functions
such as complete.cases(), duplicated(), na.omit(), etc.
3. Data Exploration: It involves examining the data to understand
its characteristics such as data type, summary statistics,
distribution, correlations, etc. Functions such as str(), summary(),
hist(), cor(), etc. can be used to explore the data.
4. Data Transformation: Transformations such as scaling,
normalization, and variable creation can be performed to make
the data suitable for analysis. R provides several functions for
these tasks, such as scale(), normalize(), mutate(), select(), etc.
5. Data Analysis: Analysis involves applying statistical techniques
such as regression, clustering, classification, and hypothesis
testing to extract insights from the data. R has a vast collection of
libraries and functions for statistical analysis such as lm(),
kmeans(), randomForest(), t.test(), etc.
6. Data Visualization: Data visualization is an essential step in data
analysis as it helps to represent the data in a meaningful way. R
provides several visualization libraries such as ggplot2, lattice,
plotly, etc. for creating visualizations such as scatter plots, line
charts, bar charts, heat maps, etc.
7. Reporting: Finally, the results of the analysis can be presented in
the form of reports, dashboards, or presentations. R provides
several tools such as R Markdown, Shiny, etc. for creating reports
that combine text, code, and visualizations.

Statistical Modelling in R
Statistical modelling in R is the process of using R software to create
models that describe the relationship between variables in a dataset.
Statistical modelling is an important part of data analysis and can be
used to predict future trends, identify patterns, and make data-driven
decisions.
There are many statistical modelling techniques that can be used in R,
including linear regression, logistic regression, time series analysis,
and machine learning algorithms. These techniques can be applied to
a wide range of data, from small datasets with only a few variables to
large datasets with many variables.
To perform statistical modelling in R, you first need to import your
data into R and prepare it for analysis. This may involve cleaning and
transforming the data, as well as selecting the variables that you want
to include in your model.
Once your data is prepared, you can then use R functions to create
and fit your model. The specific functions you use will depend on the
type of model you are creating, but some common functions include
lm() for linear regression, glm() for logistic regression, and arima() for
time series analysis.
After you have fitted your model, you can then use various R
functions to evaluate its performance and make predictions. These
functions may include summary() to view model statistics, predict() to
make predictions based on your model, and plot() to create
visualizations of your results.
Overall, R provides a powerful and flexible platform for statistical
modelling, allowing you to explore your data and create sophisticated
models that can help you make data-driven decisions.
Subject: IG Information Technology Batch: 2017
& 20 Paper: R Programming

Min. Marks: 32
Time Allowed: 3 Hours
Note: (Attempt any two questions from Section “A”, and all
questions from Section “B”)
Max. Marks: 80

Section A [Long Answer Type Questions.


Attempt Any Two] 16 x 02 = 32 Marks
Q1. Explain the R-IDE in detail.
Q2. What is a data frame? Create a data frame emp in R:
Emp_id Emp_name Salary Start_date
1 Jane 1000 2022-01-01
2 Ashton 5000 2021-09-23
3 Mike 6500 2020-11-15
4 Ryan 7290 2018-05-11
5 Gary 8410 2021-03-27

a. Get the structure of the data frame.


b. Extract row 2nd and row 5th data frame with 2nd and 4th column.
Data.
c. Add a column dept to this data frame and populate with d. Add
three more rows to this data frame.

Q3. Write a program in R to create a pie chart, showing the trend in


job market, 40% Coding, 30% Testing, 20% Design, 10% Security.
Perform following operations:
a. Show the data with corresponding labels.
b. Change radius as -2.
c. Draw chart anticlockwise and assign title to it.
d. Change the chart outer border color to red color.
e. Assign Coding green color. Testing yellow color, Design red color
and Security blue color.

Q4. What is linear regression? How is it implemented in R


programming?
Section B: Medium Type Answer (8 x
6 = 48 marks) Attempt all Questions

Q5. Write a program in R to display all prime numbers from 1 to 99.

Q6. Write a program in R to:


a. Concatenate two strings (‘R’ and ‘Programming’).
b. Count the number to characters in this string.
c. Change this string to upper and lower case.
d. Find the substring ‘Prog’.

Q7. Write a program in R to create a list containing strings, vector


and a matrix.

Q8. Write a program in R to create a vector with elements:


1,2,3,7,9,88,53, -9 and
a. Filter this vector for values greater than 5.
b. Display type of the given vector elements.
c. Number of total elements in the given vector

Q9. Illustrate any three bar plot functions.


Q10. What are the different data interfaces we can read and write
from in R Language?

Q11. Write a program in R to create a vector with elements:


3,6,9,3,1,4,11,2,3,4,5. Calculate and display its mean, median and
mode.

Q12. Write a program in R to compute normal distribution


percentage of a student scoring 85 or more marks in exam, assume
mean of test marks is 70 and standard deviation is 10.
Q1. Explain the R-IDE in detail.
R-IDE (R Integrated Development Environment) is a software
application that provides a user-friendly interface for writing, editing,
and executing R code. R is a powerful programming language and
environment for statistical computing and graphics. R-IDE provides a
comprehensive environment for statistical computing and
visualization.

Some of the features of R-IDE are:


Code editor: R-IDE provides a code editor that supports syntax
highlighting, auto-completion, and code folding. It also provides a
console window for running R commands and displaying their output.
Debugging: R-IDE provides tools for debugging R code, such as
breakpoints, variable inspection, and step-by-step execution.
Project management: R-IDE allows users to organize their code and
data into projects, making it easy to manage multiple scripts and data
files.
Package management: R-IDE provides tools for installing, updating,
and managing R packages, which are collections of R functions and
data sets that can be easily shared and reused.
Data visualization: R-IDE provides a wide range of tools for data
visualization, including basic plots, advanced graphics, and interactive
visualizations.
Collaboration: R-IDE supports collaboration features such as version
control, sharing code and data, and remote access to shared
resources.

Some popular R-IDEs are:


RStudio: RStudio is a popular open-source R-IDE that provides a user-
friendly interface for developing R code. It is available for Windows,
Mac, and Linux platforms.
Visual Studio Code: Visual Studio Code is a free and open-source
code editor that can be extended to support R development. It
provides features such as syntax highlighting, code completion, and
debugging.
Jupyter Notebook: Jupyter Notebook is an open-source web
application that allows users to create and share documents that
contain live code, equations, visualizations, and narrative text. It
supports a wide range of programming languages, including R.

In conclusion, R-IDE is a powerful tool for statistical computing and


data analysis that provides a user-friendly interface for writing,
editing, and executing R code. It is a crucial tool for data scientists,
statisticians, and analysts who work with R on a regular basis.
Q3. Write a program in R to create a pie chart, showing the
trend in job market, 40% Coding, 30% Testing, 20% Design,
10% Security. Perform following operations:
A. Show the data with corresponding labels.
B. Change radius as -2.
C. Draw chart anticlockwise and assign title to it.
D. Change the chart outer border color to red color.
E. Assign Coding green color. Testing yellow color,
Design red color and Security blue color.

Here is the R code to create a pie chart with the specifications


mentioned in the question:
R
# Data and corresponding labels
job_data <- c(40, 30, 20, 10)
job_labels <- c("Coding", "Testing", "Design",
"Security")

# Colors for each sector


job_colors <- c("green", "yellow", "red",
"blue")

# Create the pie chart


pie(job_data,
labels = job_labels,
col = job_colors,
radius = 2,
clockwise = FALSE,
border = "red",
main = "Job Market Trend")

# Add a legend
legend("topright",
legend = job_labels,
fill = job_colors)

Explanation:
 We start by defining the job data and labels as two separate
vectors.
 Next, we create a vector of colors to use for each sector of the
pie chart.
 We then call the pie() function with the job data, labels, and
colors as arguments. We also set the radius to 2, draw the chart
anticlockwise, and change the outer border color to red. Finally,
we assign a title to the chart using the main argument.
 Lastly, we add a legend to the chart using the legend() function.
The legend is positioned in the top right corner of the chart, and
displays the job labels and their corresponding colors.

Q4. What is linear regression? How is it implemented in R


programming?

Linear regression is a statistical method used to model the


relationship between a dependent variable (often denoted as "Y")
and one or more independent variables (often denoted as "X"). It
assumes that there is a linear relationship between the variables,
meaning that the change in Y can be explained by a linear change in
X.

In other words, linear regression is used to predict the value of the


dependent variable based on the values of one or more independent
variables. This makes it a useful tool for analyzing and understanding
the relationship between variables.

Linear regression can be implemented in R programming using the


lm() function. Here's an example of how to use lm():

# Create some example data


x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 5, 7, 8)

# Fit a linear regression model


model <- lm(y ~ x)

# Print the model summary


summary(model)
In this example, we have two vectors of data x and y representing the
independent and dependent variables, respectively. We then use the
lm() function to fit a linear regression model to the data. The formula
y ~ x specifies that we want to predict y based on x. The output of
lm() is a linear regression model object, which can be used to make
predictions or extract information about the model.

We can use the summary() function to print a summary of the model,


which includes information such as the coefficients of the regression
equation, the standard error of the estimates, the R-squared value,
and more. This information can be used to assess the goodness of fit
of the model and to make inferences about the relationship between
the variables.

Q5. Write a program in R to display all prime numbers


from 1 to 99.
Here is the R code to display all prime numbers from 1 to 99:
R
# Define a function to check if a number is
prime
is_prime <- function(n) {
if (n <= 1) {
return(FALSE)
}
for (i in 2:(n-1)) {
if (n %% i == 0) {
return(FALSE)
}
}
return(TRUE)
}

# Loop over all numbers from 1 to 99 and print


primes
for (i in 1:99) {
if (is_prime(i)) {
cat(i, " ")
}
}
Explanation:
 We define a function is_prime() that takes a number as input and
returns TRUE if the number is prime, and FALSE otherwise.
 The function checks if the input number is less than or equal to 1,
since these are not prime. It then loops over all numbers from 2
to n-1, checking if the input number is divisible by any of them. If
it is, the function returns FALSE, indicating that the number is not
prime. If no factors are found, the function returns TRUE,
indicating that the number is prime.
 We then loop over all numbers from 1 to 99 using a for loop, and
print each number that is prime using the cat() function. The cat()
function is used to print the numbers without any surrounding
quotation marks or formatting.

Q6. Write a program in R to: a. Concatenate two strings (‘R’


and ‘Programming’). b. Count the number to characters in this
string. c. Change this string to upper and lower case. d. Find
the substring ‘Prog’.
Here is the R code to perform the mentioned operations:
R
# Concatenate two strings
string1 <- "R"
string2 <- "Programming"
concatenated_string <- paste(string1, string2,
sep = "")
cat("Concatenated string: ",
concatenated_string, "\n")

# Count the number of characters in the string


num_characters <- nchar(concatenated_string)
cat("Number of characters: ", num_characters,
"\n")

# Change the string to upper and lower case


upper_case_string <-
toupper(concatenated_string)
lower_case_string <-
tolower(concatenated_string)
cat("Uppercase string: ", upper_case_string,
"\n")
cat("Lowercase string: ", lower_case_string,
"\n")

# Find the substring 'Prog'


substring <- substr(concatenated_string, start
= 3, stop = 6)
cat("Substring: ", substring, "\n")
Explanation:
 We start by defining two strings string1 and string2, and
concatenate them using the paste() function. We set the sep
argument to an empty string, so that the two strings are joined
together without any separator. We then print the concatenated
string using the cat() function.
 We count the number of characters in the string using the
nchar() function, and print the result using cat().
 We change the concatenated string to uppercase and lowercase
using the toupper() and tolower() functions, respectively. We
print the resulting strings using cat().
 We find the substring "Prog" using the substr() function. We
specify the starting position as 3 (since "P" is the third character
in the string) and the stopping position as 6 (since "g" is the sixth
character in the string). We print the resulting substring using
cat().
Q7. Write a program in R to create a list containing
strings, vector and a matrix.
Here is the R code to create a list containing a string, a vector, and a
matrix:
R
# Create a list containing a string, a
vector, and a matrix
my_list <- list("hello", c(1, 2, 3),
matrix(c(4, 5, 6, 7, 8, 9), nrow = 2))

# Print the list


print(my_list)
Explanation:
 We create a list called my_list using the list() function. Inside the
function, we include three objects: a string "hello", a vector c(1,
2, 3), and a matrix created using the matrix() function. The
matrix() function takes a vector of elements and converts it into a
matrix with a specified number of rows and columns.
 We print the resulting list using the print() function. The output
should display the three objects contained in the list, with their
respective classes and dimensions.
Q8. Write a program in R to create a vector with elements:
1,2,3,7,9,88,53, -9 and a. Filter this vector for values greater
than 5. b. Display type of the given vector elements. c.
Number of total elements in the given vector
Here is the R code to create a vector and perform the mentioned
operations:
R
# Create a vector with elements
my_vector <- c(1, 2, 3, 7, 9, 88, 53, -9)

# Filter the vector for values greater than 5


filtered_vector <- my_vector[my_vector > 5]
cat("Filtered vector: ", filtered_vector, "\n")

# Display the type of the vector elements


cat("Type of vector elements: ",
sapply(my_vector, class), "\n")

# Display the number of elements in the vector


cat("Number of elements in the vector: ",
length(my_vector), "\n")
Explanation:
 We create a vector called my_vector using the c() function. We
initialize the vector with the given elements.
 We filter the vector for values greater than 5 using logical
indexing. We create a new vector called filtered_vector by
selecting only the elements of my_vector that satisfy the
condition my_vector > 5. We print the resulting vector using the
cat() function.
 We display the type of the vector elements using the sapply()
function, which applies the class() function to each element of
my_vector and returns a vector of their respective classes. We
print the resulting vector using the cat() function.
 We display the number of elements in the vector using the
length() function, which returns the number of elements in a
vector. We print the resulting value using the cat() function.

Q9. Illustrate any three bar plot functions.


Here are three different functions in R to create bar plots:
1. barplot() function:
The barplot() function is used to create a basic bar plot in R. It takes a
vector or matrix of values as input and displays them as vertical or
horizontal bars.
R
# Create a vector of values
my_vector <- c(10, 20, 30, 40)

# Create a basic bar plot


barplot(my_vector, main = "My Bar Plot",
xlab = "Categories", ylab = "Values")
This code will create a vertical bar plot with four bars, labeled with
the categories "1", "2", "3", and "4". The main, xlab, and ylab
arguments are used to add a title and axis labels to the plot.
2. ggplot2 package:
The ggplot2 package is a popular package in R for creating high-
quality graphics, including bar plots. It provides a more flexible and
customizable approach to data visualization than the barplot()
function.
R
# Load the ggplot2 package
library(ggplot2)

# Create a data frame of values


my_df <- data.frame(categories = c("A",
"B", "C", "D"), values = c(10, 20, 30, 40))

# Create a bar plot using ggplot2


ggplot(my_df, aes(x = categories, y =
values)) +
geom_bar(stat = "identity") +
labs(title = "My Bar Plot", x =
"Categories", y = "Values")
This code will create a vertical bar plot with four bars, labeled with
the categories "A", "B", "C", and "D". The ggplot() function initializes a
plot object, and the geom_bar() function is used to create the bars.
The labs() function is used to add a title and axis labels to the plot.
3. plotly package:
The plotly package is another popular package in R for creating
interactive visualizations, including bar plots. It allows the user to
create interactive and customizable plots that can be zoomed,
panned, and annotated.
R
# Load the plotly package
library(plotly)

# Create a data frame of values


my_df <- data.frame(categories = c("A",
"B", "C", "D"), values = c(10, 20, 30, 40))

# Create an interactive bar plot using


plotly
plot_ly(my_df, x = ~categories, y =
~values, type = "bar") %>%
layout(title = "My Bar Plot", xaxis =
list(title = "Categories"), yaxis =
list(title = "Values"))
This code will create an interactive vertical bar plot with four bars,
labeled with the categories "A", "B", "C", and "D". The plot_ly()
function initializes a plot object, and the layout() function is used to
customize the plot title and axis labels. The resulting plot can be
zoomed, panned, and annotated using the plotly interface.
Q10. What are the different data interfaces we can read
and write from in R Language?
In R language, there are several data interfaces that we can read and
write from. Some of the commonly used data interfaces are:
1. CSV files: CSV (Comma Separated Values) files are a simple and
widely used data format that can be read and written using the
read.csv() and write.csv() functions in R.
2. Excel files: Excel files are a popular data format for storing
tabular data. They can be read and written in R using the readxl
package and the read_excel() and write_excel() functions.
3. JSON files: JSON (JavaScript Object Notation) is a lightweight data
format that is used for storing and exchanging data on the web.
JSON files can be read and written in R using the jsonlite package
and the fromJSON() and toJSON() functions.
4. SQL databases: SQL (Structured Query Language) databases are
commonly used to store large amounts of structured data. R can
connect to SQL databases using the RODBC package or other
packages like DBI and RSQLite.
5. APIs: APIs (Application Programming Interfaces) provide a way to
access data from remote servers. R can make HTTP requests and
parse the returned JSON or XML data using packages like httr and
jsonlite.
6. Web scraping: Web scraping involves extracting data from web
pages using tools like rvest and xml2. This is useful for extracting
data that is not available in a structured format or from sources
that do not provide an API.
7. Other file formats: R can read and write from a variety of other
file formats, including XML, YAML, HDF5, and more. There are
packages available for each of these formats, such as xml2, yaml,
and hdf5r.

Q11. Write a program in R to create a vector with


elements: 3,6,9,3,1,4,11,2,3,4,5. Calculate and display its
mean, median and mode.
Here's a program in R to create a vector with the given elements and
calculate its mean, median, and mode:
perl
# create the vector
vec <- c(3, 6, 9, 3, 1, 4, 11, 2, 3, 4, 5)

# calculate the mean


mean_vec <- mean(vec)
cat("Mean of the vector:", mean_vec, "\n")

# calculate the median


median_vec <- median(vec)
cat("Median of the vector:", median_vec,
"\n")

# calculate the mode


mode_vec <- names(table(vec))[table(vec) ==
max(table(vec))]
cat("Mode of the vector:", mode_vec, "\n")
Output:
c
Mean of the vector: 4.818182
Median of the vector: 4
Mode of the vector: 3

Q12. Write a program in R to compute normal


distribution percentage of a student scoring 85 or more
marks in exam, assume mean of test marks is 70 and
standard deviation is 10.
Here's a program in R to compute the percentage of students scoring
85 or more marks in an exam, assuming the mean of the test marks is
70 and the standard deviation is 10, using the normal distribution:
r
# calculate the probability of scoring 85
or more
prob_more_than_85 <- pnorm(85, mean = 70,
sd = 10, lower.tail = FALSE)

# convert the probability to a percentage


perc_more_than_85 <- prob_more_than_85 *
100

# display the result


cat("Percentage of students scoring 85 or
more marks:", round(perc_more_than_85, 2),
"%\n")

Output:
css
Percentage of students scoring 85 or more
marks: 15.87 %
This means that about 15.87% of the students will score 85 or more
marks in the exam assuming normal distribution of marks with a
mean of 70 and a standard deviation of 10.

Question Paper 2:

Section A: Long Type


Questions (16 Marks)
(Unit 1) Discuss the advantages and disadvantages of using R
programming language. Also, explain the installation process of R
and its integrated development environment (IDE).

(Unit 2) Differentiate between vectors, lists, arrays, and matrices in


R. Also, explain the concept of factors in R programming.

(Unit 3) Explain the concept of data reshaping in R. Also, discuss the


object-oriented programming approach and the debugging process
in R.

(Unit 4) Discuss the basics of statistical analysis in R programming.


Explain the concepts of mean, median, mode, and regression
analysis in detail.

Section B: Medium Type


Questions (6 Marks)
(Unit 1) Write a program in R to check whether a given number is
even or odd using decision-making statements and loops. Also,
explain the different types of operators and keywords used in R
programming.

(Unit 1) Explain the concept of functions in R programming with


suitable examples. Also, discuss the importance of packages in R
programming.

(Unit 2) Write a program in R to count the number of characters in a


given string. Also, discuss the different methods of string
manipulation in R programming.

(Unit 2) Explain the concept of data frames in R programming. Also,


discuss the process of indexing and subsetting in R.

(Unit 3) Write a program in R to read data from a CSV file and


display it on the console. Also, discuss the different data interfaces
in R programming.

(Unit 3) Explain the concept of data visualization in R programming.


Discuss the different types of plots, such as pie charts, bar charts,
and histograms, with suitable examples.
(Unit 4) Write a program in R to perform linear regression analysis
on a given data set. Also, explain the concept of normal and
binomial distributions in R programming.

(Unit 4) Discuss the concept of time series analysis in R


programming. Also, explain the different techniques of classification
analysis in R.

(Unit 1) Discuss the advantages and disadvantages of


using R programming language. Also, explain the
installation process of R and its integrated development
environment (IDE).
Advantages of R programming language:
1. Open source: R is a free, open-source programming language.
This means that anyone can use and modify the code as per their
requirements.
2. Extensive libraries: R provides a wide range of statistical and
graphical techniques through its extensive library. There are
more than 10,000 packages available for R that can be used for
various purposes, including data visualization, data analysis,
machine learning, and statistical modeling.
3. Interactive environment: R provides an interactive environment
for data analysis and modeling. This means that you can run code
snippets and get immediate feedback on your results.
4. Cross-platform compatibility: R can be used on various platforms
like Windows, Linux, and Mac OS.
5. Collaboration: R provides a platform for collaboration through its
extensive library and online communities. Users can share their
code, packages, and data with others, making it easier for them
to collaborate on projects.
Disadvantages of R programming language:
1. Steep learning curve: R has a steep learning curve, and it can
take some time to get used to its syntax and concepts.
2. Memory management: R requires manual memory
management, which can be challenging for beginners.
3. Limited graphical capabilities: R's graphical capabilities are
limited compared to other programming languages like Python.
4. Performance issues: R can be slow in certain cases, especially
when working with large datasets.
Installation process of R:
The installation process of R is straightforward. Follow the steps
below to install R on your system:
1. Go to the official website of R (https://www.r-project.org/) and
click on the "Download R" link.
2. Choose your operating system and click on the corresponding
link.
3. Click on the "Download R for (your OS)" link.
4. Run the downloaded installer file and follow the instructions.
Integrated Development Environment (IDE) for R:
There are several IDEs available for R programming. Some popular
IDEs are:
1. RStudio: RStudio is a popular IDE for R programming. It provides
an interactive environment for data analysis and modeling and
has many features like code highlighting, debugging, and version
control.
2. Jupyter Notebook: Jupyter Notebook is an open-source web
application that allows you to create and share documents that
contain live code, equations, visualizations, and narrative text.
3. Eclipse: Eclipse is a widely used IDE for various programming
languages, including R. It provides many features like syntax
highlighting, code completion, and debugging.
4. Visual Studio Code: Visual Studio Code is a lightweight and
powerful IDE for various programming languages, including R. It
provides many features like code highlighting, debugging, and
version control.

(Unit 2) Differentiate between vectors, lists, arrays, and


matrices in R. Also, explain the concept of factors in R
programming.
In R programming, the following are the different data structures
available:
1. Vectors: A vector is a one-dimensional array-like object that can
hold elements of the same type. It is the simplest and most
common data structure used in R. A vector can be created using
the c() function.
2. Lists: A list is a collection of objects that can be of different types,
including other lists. It can be created using the list() function.
3. Arrays: An array is a multi-dimensional data structure that can
hold elements of the same type. It can be created using the
array() function.
4. Matrices: A matrix is a two-dimensional array-like object that can
hold elements of the same type. It can be created using the
matrix() function.
Factors are used to represent categorical data in R. A factor is a
vector that can take on one of a finite set of values called levels.
Factors are useful for representing variables that can take on a
limited number of possible values, such as gender or race. They can
be created using the factor() function. Factors are commonly used in
statistical analysis and machine learning.

(Unit 3) Explain the concept of data reshaping in R. Also,


discuss the object-oriented programming approach and
the debugging process in R.
Data Reshaping in R: Data reshaping is an important concept in data
analysis where we transform data from one structure to another. It
involves changing the way data is organized or presented to make it
more suitable for analysis. R provides several functions to reshape
data, including melt(), dcast(), and reshape().
Object-oriented programming (OOP) in R: Object-oriented
programming is a programming paradigm that focuses on the use of
objects to represent and manipulate data. In R, OOP can be
implemented using the S3, S4, and R6 systems. OOP allows for the
creation of reusable code, increased modularity, and better
organization of code.
Debugging in R: Debugging is the process of finding and fixing errors
or bugs in code. In R, we can use several functions and techniques to
debug code, such as using the traceback() function to identify where
an error occurred, setting breakpoints, using the browser() function
to inspect variables, and using the debug() function to step through
code line-by-line. The use of an integrated development environment
(IDE) can also help with debugging by providing tools such as code
highlighting, syntax checking, and debugging tools.

(Unit 4) Discuss the basics of statistical analysis in R


programming. Explain the concepts of mean, median,
mode, and regression analysis in detail.
Statistical analysis is an essential part of data science, and R
programming provides powerful tools for data analysis and statistical
modelling. In this unit, we will discuss the basics of statistical analysis
in R programming.
Mean, Median, and Mode:
Mean is the average value of a dataset. In R, we can calculate the
mean using the mean() function. For example, if we have a vector of
numbers named "x," we can calculate the mean as follows:
scss
x <- c(1, 2, 3, 4, 5)
mean(x)

This will return the value 3, which is the mean of the numbers in the
vector.
Median is the middle value of a dataset. In R, we can calculate the
median using the median() function. For example, if we have a vector
of numbers named "x," we can calculate the median as follows:
scss
x <- c(1, 2, 3, 4, 5)
median(x)

This will return the value 3, which is the median of the numbers in the
vector.
Mode is the value that appears most frequently in a dataset. In R, we
can calculate the mode using the mode() function. However, the
mode function is not available in the base R package, so we need to
load the "modeest" package first. For example, if we have a vector of
numbers named "x," we can calculate the mode as follows:
scss
library(modeest)
x <- c(1, 2, 3, 4, 4, 5)
modeest::mfv(x)

This will return the value 4, which is the mode of the numbers in the
vector.
Regression Analysis:
Regression analysis is a statistical method used to determine the
relationship between two or more variables. In R, we can perform
regression analysis using the lm() function, which stands for "linear
model." For example, if we have a dataset with two variables named
"x" and "y," we can perform a linear regression analysis as follows:
scss
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 5, 4, 6)
model <- lm(y ~ x)
summary(model)

This will output the summary of the linear regression model, which
includes information about the coefficients, standard errors, t-values,
p-values, and R-squared value.
In addition to linear regression, R provides various regression models,
such as logistic regression, Poisson regression, and nonlinear
regression, among others.
In conclusion, R programming provides a wide range of functions and
packages for statistical analysis, making it a popular choice among
data scientists and statisticians. By understanding the basics of
statistical analysis in R, we can perform various statistical modelling
tasks and gain insights from data.

(Unit 1) Write a program in R to check whether a given


number is even or odd using decision-making statements
and loops. Also, explain the different types of operators
and keywords used in R programming.
To check whether a given number is even or odd in R, we can use
decision-making statements and loops. Here's a sample code:
bash
# Taking input from the user
num <- as.integer(readline(prompt="Enter a number: "))
# Checking if the number is even or odd
if(num %% 2 == 0){
print(paste(num, "is even"))
} else {
print(paste(num, "is odd"))
}

In the above code, we first take an integer input from the user using
the readline() function. We then use the modulo operator (%%) to
check if the number is divisible by 2 or not. If the remainder is 0, the
number is even, and we print the message "is even". If the remainder
is 1, the number is odd, and we print the message "is odd".
Operators and keywords are used to perform various operations and
control the flow of the program in R programming. Here are some
examples of operators and keywords:
1. Arithmetic operators: +, -, *, /, %%, %/%
2. Comparison operators: ==, !=, <, >, <=, >=
3. Logical operators: &, |, !
4. Assignment operators: <-, =
5. Control flow keywords: if, else, for, while, repeat, break, next,
function, return
Arithmetic operators are used for mathematical calculations,
comparison operators are used to compare values, and logical
operators are used to combine multiple conditions. Assignment
operators are used to assign values to variables, while control flow
keywords are used to control the flow of the program.

(Unit 1) Explain the concept of functions in R


programming with suitable examples. Also, discuss the
importance of packages in R programming.
In R programming, functions are a set of instructions that perform a
specific task. Functions take inputs (arguments) and provide outputs,
which can be used in other parts of the program. Functions are
essential in programming as they help in reducing code complexity,
improving readability, and making the code more modular.
Here's an example of a simple function in R that takes two numbers
as input and returns their sum:
r
sum_numbers <- function(x, y) {
sum <- x + y
return(sum)
}

To use this function, we can call it with two numbers as arguments:


scss
result <- sum_numbers(3, 5)
print(result)

This will output the result of the function, which is 8.


Functions can also have default values for their arguments, which can
be overwritten by the user if needed. Here's an example:
sql
greet_user <- function(name = "User") {
message <- paste("Hello, ", name, "!")
print(message)
}

# Call the function with default argument


greet_user()

# Call the function with a specific name


greet_user("John")

This will output:

csharp
[1] "Hello, User!"
[1] "Hello, John!"

Packages are collections of functions, data, and documentation that


can be easily imported into R. They allow programmers to extend the
functionality of R and reuse code written by others. Packages can be
installed and loaded into R using the install.packages() and library()
functions, respectively.
Here's an example of installing and loading the tidyverse package:
scss
install.packages("tidyverse")
library(tidyverse)

Once loaded, the functions and data within the package can be used
in the R program. Packages are important in R programming because
they save time by providing pre-written code, and they help in
avoiding errors by providing reliable and tested functions.

(Unit 2) Write a program in R to count the number of


characters in a given string. Also, discuss the different
methods of string manipulation in R programming.
To count the number of characters in a given string in R
programming, we can use the nchar() function.
Example:
php
# program to count number of characters in a string
string <- "Hello World!"
count <- nchar(string)
print(paste("Number of characters in string:", count))

Output:

csharp
[1] "Number of characters in string: 12"

There are several methods of string manipulation in R programming:


1. Subsetting: We can subset a string using square brackets []. For
example, string[1] will give the first character of the string.
2. Concatenation: We can concatenate two strings using the paste()
function. For example, paste("Hello", "World!") will give "Hello
World!".
3. Case conversion: We can convert the case of a string using the
toupper() and tolower() functions.
4. Search and replace: We can search for a particular pattern in a
string and replace it with another using the gsub() function.
5. Regular expressions: We can use regular expressions to
manipulate strings in more complex ways, such as extracting
certain patterns or removing unwanted characters. The stringr
package in R provides several functions for working with regular
expressions.
6. Other functions: R programming provides several other functions
for string manipulation, such as strsplit(), substring(), and
chartr().

(Unit 2) Explain the concept of data frames in R


programming. Also, discuss the process of indexing and
subsetting in R.
In R programming, a data frame is a two-dimensional structure that
contains rows and columns of data. It is a collection of vectors of
equal length. The columns of a data frame can have different data
types such as character, numeric, logical, or factor. The rows of a data
frame represent observations or cases.
Data frames can be created using the data.frame() function in R. For
example:
less
# Creating a data frame
student_data <- data.frame(
name = c("John", "Mary", "David"),
age = c(21, 23, 20),
grade = c("A", "B", "C"),
stringsAsFactors = FALSE
)

In the above example, we have created a data frame student_data


with three columns: name, age, and grade.
Indexing and subsetting are the processes of extracting specific values
or subsets of a data frame. In R, indexing is done using square
brackets []. For example:
python
# Indexing a data frame
student_data[1,] # Extracting the first row of the data frame
student_data[,2] # Extracting the second column of the data frame
student_data[1:2,3] # Extracting the third column for the first two rows of the data frame

Subsetting is a way of creating a subset of a data frame based on


certain conditions. For example:
python
# Subsetting a data frame
subset(student_data, age > 20) # Extracting rows where age is greater than 20
subset(student_data, grade == "A") # Extracting rows where grade is equal to "A"

The subset() function is used to subset the data frame based on


certain conditions.
Overall, indexing and subsetting are useful techniques in R for
extracting specific data from a data frame.
(Unit 3) Write a program in R to read data from a CSV file
and display it on the console. Also, discuss the different
data interfaces in R programming.
To read data from a CSV file and display it on the console, we can use
the read.csv() function and print() function in R. Here's an example:
R
# Read data from CSV file
mydata <- read.csv("myfile.csv")

# Display data on console


print(mydata)

This code will read the data from the CSV file named "myfile.csv" and
store it in the variable mydata. The print() function is then used to
display the data on the console.
In R programming, there are several data interfaces available to read
and write data from different file formats. Some of the commonly
used data interfaces are:
1. CSV files: Comma-separated values (CSV) files are a common file
format for storing and exchanging data in a tabular format.
2. Excel files: Excel files are used to store and exchange data in a
tabular format. R provides several packages to read and write
data from Excel files, including readxl, xlsx, and openxlsx.
3. Binary files: Binary files are used to store and exchange data in a
binary format. R provides several functions to read and write
data from binary files, including readBin(), writeBin(), and
serialize().
4. XML files: XML files are used to store and exchange data in a
structured format. R provides several packages to read and write
data from XML files, including XML, xml2, and rvest.
5. JSON files: JSON (JavaScript Object Notation) files are used to
store and exchange data in a structured format. R provides
several packages to read and write data from JSON files,
including jsonlite, RJSONIO, and rjson.
These data interfaces make it easy to read and write data from
different file formats in R programming.

(Unit 3) Explain the concept of data visualization in R


programming. Discuss the different types of plots, such
as pie charts, bar charts, and histograms, with suitable
examples.
Data visualization is the graphical representation of data to
understand and analyze it. R programming language provides various
functions and packages for data visualization. There are different
types of plots available in R programming for data visualization, such
as pie charts, bar charts, histograms, line graphs, and scatter plots.
Pie Chart: A pie chart is used to represent the data in the form of a
circle, and each slice of the pie represents a specific category or
quantity. In R programming, we can create a pie chart using the pie()
function. Here is an example:
python
# Create a pie chart
data <- c(20, 30, 50)
labels <- c("Apples", "Oranges", "Bananas")
pie(data, labels = labels, main = "Fruits")

Bar Chart: A bar chart is used to represent the data in the form of
bars, where each bar represents a category or quantity. In R
programming, we can create a bar chart using the barplot() function.
Here is an example:
r
# Create a bar chart
data <- c(20, 30, 50)
names <- c("Apples", "Oranges", "Bananas")
barplot(data, names.arg = names, main = "Fruits")

Histogram: A histogram is used to represent the frequency


distribution of data. In R programming, we can create a histogram
using the hist() function. Here is an example:
scss
# Create a histogram
data <- c(2, 3, 3, 4, 5, 5, 5, 6, 6, 7)
hist(data, breaks = 5, main = "Histogram")

Line Graph: A line graph is used to represent the trend of data over
time. In R programming, we can create a line graph using the plot()
function. Here is an example:
scss
# Create a line graph
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)
plot(x, y, type = "o", main = "Line Graph")

Scatter Plot: A scatter plot is used to represent the relationship


between two variables. In R programming, we can create a scatter
plot using the plot() function. Here is an example:
r
# Create a scatter plot
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)
plot(x, y, main = "Scatter Plot")

In addition to these, R programming provides many other types of


plots and visualizations, and we can choose the appropriate plot
based on the data and the analysis requirements.
(Unit 4) Write a program in R to perform linear
regression analysis on a given data set. Also, explain the
concept of normal and binomial distributions in R
programming.
To perform linear regression analysis in R, we can use the lm()
function. Here is an example program that performs linear regression
on a data set:
r
# Import the data set
data <- read.csv("data.csv")

# Perform linear regression


model <- lm(Y ~ X, data)

# Print the results


summary(model)

This program reads in a data set from a CSV file, performs linear
regression on the data using the lm() function, and prints out the
results using the summary() function.
The concept of normal distribution in R programming refers to a
continuous probability distribution that is symmetrical around its
mean. The dnorm() function in R can be used to calculate the density
of a normal distribution, the pnorm() function can be used to
calculate the cumulative distribution function of a normal
distribution, and the qnorm() function can be used to calculate the
quantiles of a normal distribution.
The concept of binomial distribution in R programming refers to a
discrete probability distribution that represents the number of
successes in a fixed number of independent trials. The dbinom()
function in R can be used to calculate the probability mass function of
a binomial distribution, the pbinom() function can be used to
calculate the cumulative distribution function of a binomial
distribution, and the qbinom() function can be used to calculate the
quantiles of a binomial distribution.

(Unit 4) Discuss the concept of time series analysis in R


programming. Also, explain the different techniques of
classification analysis in R.
Time Series Analysis in R:
Time series analysis is a statistical method of analyzing a series of
data points taken over time. Time series data can be used to identify
trends, patterns, and cycles in the data. R provides a variety of tools
for time series analysis, including the stats and forecast packages.
The following are some of the techniques used in time series analysis
in R:
1. Decomposition: Decomposition is the process of breaking down
a time series into its components, such as trend, seasonal, and
random. This can be done using the decompose() function in R.
2. Smoothing: Smoothing is the process of removing noise or
irregularities in the data to reveal underlying trends. This can be
done using techniques such as moving averages and exponential
smoothing.
3. Autocorrelation: Autocorrelation is the correlation of a time
series with a lagged version of itself. This can be used to identify
patterns and cycles in the data.
4. ARIMA Modeling: ARIMA (Autoregressive Integrated Moving
Average) modeling is a technique used to model time series data.
It involves identifying the order of differencing, autoregression,
and moving average components of the data.
Classification Analysis in R:
Classification analysis is a statistical technique used to classify data
into predefined classes or categories based on a set of features. R
provides several packages for classification analysis, including caret
and e1071.
The following are some of the techniques used in classification
analysis in R:
1. Decision Trees: Decision trees are a tree-like model used to
classify data based on a set of rules. R provides several packages
for building decision trees, including rpart and tree.
2. Logistic Regression: Logistic regression is a statistical technique
used to model binary outcomes. It can be used for classification
analysis by setting a threshold for the predicted probabilities.
3. Random Forest: Random Forest is an ensemble learning
technique that combines multiple decision trees to improve the
accuracy of the model. R provides the randomForest package for
building random forest models.
4. Support Vector Machines: Support Vector Machines (SVM) is a
supervised learning technique used for classification analysis. R
provides the e1071 package for building SVM models.
Overall, both time series analysis and classification analysis are
important techniques in data analysis and machine learning, and R
provides a variety of tools for implementing these techniques.

THE END OF BULL SHI

You might also like