KEMBAR78
RP Lab Manual | PDF | Regression Analysis | Statistics
0% found this document useful (0 votes)
95 views24 pages

RP Lab Manual

The document is an Experiential Learning Manual for a course on R Programming at Mohan Babu University, detailing course structure, outcomes, and content. It covers topics such as R programming constructs, statistical analysis, and data visualization, along with practical programming exercises. The manual also includes resources for further learning and a lab manual with specific coding tasks.

Uploaded by

montspice38
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views24 pages

RP Lab Manual

The document is an Experiential Learning Manual for a course on R Programming at Mohan Babu University, detailing course structure, outcomes, and content. It covers topics such as R programming constructs, statistical analysis, and data visualization, along with practical programming exercises. The manual also includes resources for further learning and a lab manual with specific coding tasks.

Uploaded by

montspice38
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

MOHAN BABU UNIVERSITY

Sree Sainath Nagar, Tirupati 517 102

EXPERIENTIAL LEARNING MANUAL


PROGRAM ELECTIVE

Course Code Course Title L T P S C

22CS105001 R PROGRAMMING - 1 2 - 2

Pre-Requisite -

Anti-Requisite -
Co-Requisite -

COURSE DESCRIPTION: Introduction to R, R Programming Structures, Doing Math and


Simulation in R, Creating Graphs, Probability Distributions, correlation and Regression and
Random Forests.

COURSE OUTCOMES: After successful completion of this course, the students will be
able to:

CO1. Apply R programming constructs to store and manipulate datasets.

CO2. Develop modules using R programming constructs to solve statistical problems.

CO3. Perceive data models to perform descriptive and inferential statistical analysis to
identify trends, patterns in data.
CO4. Create effective visualization using Histograms, Bar plots, Box plots, Scatter
plots for exploratory data analysis.
CO5. Work independently to solve problems with effective communication.

CO-PO-PSO Mapping Table


Program
Course Program Outcomes Specific
Outcome Outcomes
s PO PO PO PO PO PO PO PO PO PO1 PO1 PO1 PSO PSO PSO PSO
1 2 3 4 5 6 7 8 9 0 1 2 1 2 3 4
CO1 3 - - 2 3 - - - - - - - 3 - - -

CO2 3 1 1 1 3 - - - - - - - 3 - - -

CO3 3 3 2 3 3 - - - - - - - 3 - - -

CO4 3 3 2 3 3 - - - - - - - 3 - - -

CO5 - - - - - - - - 3 3 - - - - - -

Course
Correlat
ion 3 2 2 2 3 - - - 3 3 - - 3 - - -
Mappin
g
Correlation Level: 3- High 2-Medium 1- Low
COURSE CONTENT

Module1: INTRODUCTION TO R (08 Periods)


Introduction, How to run R, R Sessions and Functions, Basic Math, Variables, Data Types,
Vectors, Conclusion, Advanced Data Structures, Data Frames, Lists, Matrices, Arrays,
Classes.

Module2: R PROGRAMMING STRUCTURES (10 Periods)


R Programming Structures, Control Statements, Loops, -Looping Over Nonvector Sets,-If-
Else, Arithmetic and Boolean Operators and values, Default Values for Argument, Return
Values, Deciding Whether to explicitly call return-Returning Complex Objects, Functions
are Objective, No Pointers in R, Recursion, A Quicksort Implementation-Extended
Extended Example: A Binary Search Tree.

Module3 DOING MATH AND SIMULATION IN R (10 Periods)


Doing Math and Simulation in R, Math Function, Extended Example Calculating
Probability-Cumulative Sums and Products-Minima and Maxima-Calculus, Functions Fir
Statistical Distribution, Sorting, Linear Algebra Operation on Vectors and Matrices,
Extended Example: Vector cross Product-Extended Example: Finding Stationary
Distribution of Markov Chains, Set Operation, Input /out put, Accessing the Keyboard and
Monitor, Reading and writer Files.
Module4 GRAPHICS (8 Periods)
Graphics, Creating Graphs, The Workhorse of R Base Graphics, the plot() Function –
Customizing Graphs, Saving Graphs to Files.

Module5 PROBABILITY DISTRIBUTIONS AND REGRESSION (9 Periods)


MODELS
Probability Distributions, Normal Distribution-Binomial Distribution-Poisson Distributions
Other Distribution, Basic Statistics, Correlation and Covariance, T-Tests,-ANOVA. Linear
Models, Simple Linear Regression, -Multiple Regression Generalized Linear Models,
Logistic Regression, -Poisson Regression-other Generalized Linear Models-Survival
Analysis, Nonlinear Models, Splines-Decision-Random Forests.
TotalPeriods:45

EXPERIENTIAL LEARNING:

Datatypes, Variables, Operators, Data structures – Vectors, Arrays, Matrices, Lists,


Data frames; Object oriented programming – S3, S4 classes; Selection statements –
if statement, if else statement, switch statement; Iterative statements – For loop,
While loop, Repeat loop, Nested loops; Functions – Creating functions, Default values
for arguments, Return values, Environment and scope issues, Recursion.
1. Create the vectors:
a) (1, 2, 3, . . . , 19, 20)
b) (20, 19, . . . , 2, 1)
c) (1, 2, 3, . . . , 19, 20, 19, 18, . . . , 2, 1)
d) (4, 6, 3) and assign it to the name tmp.
For parts (e), (f) and (g) look at the help for the function rep.
e) (4, 6, 3, 4, 6, 3, . . . , 4, 6, 3) where there are 10 occurrences of 4.
f) (4, 6, 3, 4, 6, 3, . . . , 4, 6, 3, 4) where there are 11 occurrences of 4, 10
occurrences of 6 and 10 occurrences of 3.
g) (4, 4, . . . , 4, 6, 6, . . . , 6, 3, 3, . . . , 3) where there are 10 occurrences
of 4, 20 occurrences of 6 and 30 occurrences of 3.
2. a) Write R code that will generate a vector with the following elements.
"aa" "ba" "ca" "da" "ea" "ab" "bb" "cb" "db" "eb" "ac" "bc" "cc" "dc"
"ec" "ad" "bd" "cd" "dd" "ed" "ae" "be" "ce" "de" "ee"

b) Write a R program to create a Dataframes which contain details of 5


employees and display summary of the data.

3. a) Create a vector of a data set and treat it as an object. Using the vector
and object perform (.) dot product and (x) cross product. Take your own
data.

b) ―Fizzbuzz‖ is a simple programming challenge often used at interviews to


test very basic programming skill. Your goal is the following: for the
numbers 1 to 100, print ―fizz‖ if the number is a multiple of 3, ―buzz‖ if
the number is a multiple of 5, ―fizzbuzz‖ if the number is a multiple of
both 3 and 5, and simply print the number otherwise.

4. a) Imagine a high school with 1000 lockers all in a row, numbered 1 to 1000
in order. At the start, all of them are closed. 1000 students are sent, one
after the other, to change the state of a set of lockers (from open to
closed or closed to open). The first student changes the state of all
lockers. The second changes the state of every other one (2, 4, 6, 8, . . .
). The third changes the state of every third one (3, 6, 9, 12, . . . ). This
process continues until all 1000 students have gone. Write a R program to
determine which lockers are open at the end of this process?

b) Write a function chomp() that, given a string, removes from the string any
occurrence of the character &, as well as the character to the left of each
& character. So, for example, your function should return:
> chomp ( " a&c " )
"c"

> chomp ( " a&" )

""
> chomp ( " abc " )
" abc "

5. a) Write a function which takes a single argument which is a matrix. The


function should return a matrix which is the same as the function
argument but every odd number is doubled.

b) Write a function that takes an array of numbers x and returns the


smallest number in the array.

Importance and applications of statistical learning, Types of data, Types of


variables, Frequency distributions, Measures of center – Mean, Median, Mode;
Measures of spread – Range, Percentile, Quartiles & Interquartile range,
Standard deviation, Variance; Correlation and Covariance.

6. a) Compute descriptive statistics for the data given below.


X: 14, 20, 22, 19, 15, 18, 30, 27
Y: 16, 25, 27, 20, 16, 18, 27, 23

b) Write a R script which will compute the mean and variance of the vector x
<- 1:100. Compare with R’s internal mean() and var() functions.

7. Write a function to compute running medians. Running medians are a simple


smoothing method usually applied to time-series. For example, for the numbers
7,5, 2, 8, 5, 5, 9, 4, 7, 8, the running medians of length 3 are 5, 5, 5, 5, 5, 5, 7,
7. The first running median is the median of the three numbers 7, 5, and 2; the
second running median is the median of 5, 2, and 8; and so on. Your function
should take two arguments: the data (say, x), and the number of observations
for each median (say, length).

8. Write a R program to perform data import/export (.csv, .xlxs) operations using


data frames in R.

9. Write a R program to create bell curve of a random normal distribution.

10. Write a R program to design correlation matrix by choosing appropriate dataset.

Resources
TEXT BOOKS:

1. The Art of R Programming, Norman Matloff, Cengage Learning

2. R for Everyone, Lander, Pearson

REFERENCE BOOKS:

1. Sandip Rakshit, R for Beginners, McGraw Hill, 2017.

2. Seema Acharya, Data analytics using R, McGraw Hill, 2018.

VIDEO LECTURES:

1. https://www.classcentral.com/course/rprog-1713

2. https://www.youtube.com/playlist?list=PLVext98k2evi8mDNRo4MwIgVgSmwM
3cS8

3. https://www.udemy.com/topic/r-programming-language/

WEB RESOURCES:

1. https://www.stats.ox.ac.uk/~evans/Rprog/LectureNotes.pdf

2. https://www.tutorialspoint.com/r/r_tutorial.pdf

3. https://www.tutorialsduniya.com/notes/r-programming-notes/
R – PROGRAMMING LAB MANUAL
1. Create the vectors:
a) (1, 2, 3, . . . , 19, 20)
b) (20, 19, . . . , 2, 1)
c) (1, 2, 3, . . . , 19, 20, 19, 18, . . . , 2, 1)
d) (4, 6, 3) and assign it to the name tmp.
For parts (e), (f) and (g) look at the help for the function rep.
e) (4, 6, 3, 4, 6, 3, . . . , 4, 6, 3) where there are 10 occurrences of 4.
f) (4, 6, 3, 4, 6, 3, . . . , 4, 6, 3, 4) where there are 11 occurrences of 4, 10
occurrences of 6 and 10 occurrences of 3.
g) (4, 4, . . . , 4, 6, 6, . . . , 6, 3, 3, . . . , 3) where there are 10 occurrences
e) of 4, 20 occurrences of 6 and 30 occurrences of 3.

AIM: R code to create each of the requested vectors (a to g), along with brief comments
explaining the logic.

R Script with Code and Output

# a) Vector from 1 to 20
vec_a <- 1:20
print("a) Vector from 1 to 20:")
print(vec_a)

# b) Vector from 20 to 1
vec_b <- 20:1
print("b) Vector from 20 to 1:")
print(vec_b)

# c) Vector from 1 to 20 and back to 1


vec_c <- c(1:20, 19:1)
print("c) Vector from 1 to 20 and back to 1:")
print(vec_c)

# d) Assign vector (4, 6, 3) to 'tmp'


tmp <- c(4, 6, 3)
print("d) Vector assigned to tmp:")
print(tmp)

# e) Repeat (4, 6, 3) so there are 10 occurrences of 4


vec_e <- rep(c(4, 6, 3), times = 10)
print("e) Vector with 10 repetitions of (4, 6, 3):")
print(vec_e)

# f) 11 occurrences of 4, and 10 of 6 and 3


vec_f <- rep(c(4, 6, 3), times = c(11, 10, 10))
print("f) Vector with 11 4s, 10 6s, 10 3s:")
print(vec_f)

# g) 10 4s, 20 6s, 30 3s in sequence


vec_g <- c(rep(4, 10), rep(6, 20), rep(3, 30))
print("g) Vector with 10 4s, 20 6s, 30 3s:")
print(vec_g)

Sample Output:

[1] "a) Vector from 1 to 20:"


[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

[1] "b) Vector from 20 to 1:"


[1] 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

[1] "c) Vector from 1 to 20 and back to 1:"


[1] 1 2 3 ... 20 19 18 ... 2 1

[1] "d) Vector assigned to tmp:"


[1] 4 6 3

[1] "e) Vector with 10 repetitions of (4, 6, 3):"


[1] 4 6 3 4 6 3 ... (repeated)

[1] "f) Vector with 11 4s, 10 6s, 10 3s:"


[1] 4 4 4 ... 6 6 ... 3 3 ...

[1] "g) Vector with 10 4s, 20 6s, 30 3s:"


[1] 4 4 4 ... 6 6 ... 3 3 ...

2a) Write R code that will generate a vector with the following elements. "aa"
"ba" "ca" "da" "ea" "ab" "bb" "cb" "db" "eb" "ac" "bc" "cc" "dc" "ec" "ad"
"bd" "cd" "dd" "ed" "ae" "be" "ce" "de" "ee"

AIM: To generate the requested vector using nested combinations of letters and the
paste() function in R.

R Code to Generate the Desired Vector:

# Define first letters: a to e


first_letters <- c("a", "b", "c", "d", "e")

# Define second letters: a to e


second_letters <- c("a", "b", "c", "d", "e")

# Use outer() with paste0 to combine each element from first and second letters
result_vector <- as.vector(outer(first_letters, second_letters, paste0))

# Print the result


print(result_vector)

Output (result_vector)

[1] "aa" "ba" "ca" "da" "ea"


[6] "ab" "bb" "cb" "db" "eb"
[11] "ac" "bc" "cc" "dc" "ec"
[16] "ad" "bd" "cd" "dd" "ed"
[21] "ae" "be" "ce" "de" "ee"

This gives you a 25-element vector where each element is a two-letter combination,
with the first letter from a to e and the second letter from a to e, arranged column-
wise (i.e., second letter varies faster).

2b) Write a R program to create a Dataframes which contain details of 5 employees and
display summary of the data.

Aim: To write R program to create a Dataframes which contain details of 5 employees


and display summary of the data.

R Program: Employee Data Frame and Summary


# Create employee data
emp_id <- c(101, 102, 103, 104, 105)
emp_name <- c("Alice", "Bob", "Charlie", "Diana", "Ethan")
emp_age <- c(25, 30, 28, 35, 40)
emp_dept <- c("HR", "IT", "Finance", "IT", "HR")
emp_salary <- c(35000, 50000, 45000, 55000, 40000)

# Create the data frame


employee_df <- data.frame(
ID = emp_id,
Name = emp_name,
Age = emp_age,
Department = emp_dept,
Salary = emp_salary
)

# Print the employee data frame


print("Employee Data Frame:")
print(employee_df)

# Display summary of the data


print("Summary of Employee Data:")
summary(employee_df)

Sample Output:

[1] "Employee Data Frame:"


ID Name Age Department Salary
1 101 Alice 25 HR 35000
2 102 Bob 30 IT 50000
3 103 Charlie 28 Finance 45000
4 104 Diana 35 IT 55000
5 105 Ethan 40 HR 40000

[1] "Summary of Employee Data:"


ID Name Age Department Salary
Min. :101 Length:5 Min. :25 HR :2 Min. :35000
1st Qu.:102 Class :character 1st Qu.:28 IT :2 1st Qu.:40000
Median :103 Mode :character Median :30 Finance:1 Median :45000
Mean :103 Mean :31 Mean :45000
3rd Qu.:104 3rd Qu.:35 3rd Qu.:50000
Max. :105 Max. :40 Max. :55000

3. a. Create a vector of a data set and treat it as an object. Using the vector and object
perform (.) dot product and (x) cross product. Take your own data.

Aim: R program that:

1. Creates two numeric vectors,


2. Treats them as objects,
3. Performs the dot product (.) and
4. Performs the cross product (×) — which is defined only for 3D vectors.

R Code: Dot Product and Cross Product Using Vectors

# Create two 3D vectors (objects)


vector_a <- c(2, 4, 6)
vector_b <- c(1, 3, 5)

# Display the vectors


print("Vector A:")
print(vector_a)

print("Vector B:")
print(vector_b)

# Dot Product: sum of element-wise products


dot_product <- sum(vector_a * vector_b)
print("Dot Product (A · B):")
print(dot_product)

# Cross Product: Only for 3D vectors


cross_product <- c(
vector_a[2]*vector_b[3] - vector_a[3]*vector_b[2],
vector_a[3]*vector_b[1] - vector_a[1]*vector_b[3],
vector_a[1]*vector_b[2] - vector_a[2]*vector_b[1]
)
print("Cross Product (A × B):")
print(cross_product)

Sample Output

[1] "Vector A:"


[1] 2 4 6

[1] "Vector B:"


[1] 1 3 5

[1] "Dot Product (A · B):"


[1] 44

[1] "Cross Product (A × B):"


[1] 2 -4 2

Note:

 Dot product formula: a1*b1 + a2*b2 + a3*b3


 Cross product result is a vector perpendicular to both A and B (in 3D space)

b) ―Fizzbuzz‖ is a simple programming challenge often used at interviews to test very


basic programming skill. Your goal is the following: for the numbers 1 to 100, print ―fizz‖
if the number is a multiple of 3, ―buzz‖ if the number is a multiple of 5, ―fizzbuzz‖ if the
number is a multiple of both 3 and 5, and simply print the number otherwise.

Aim: A complete R program to solve the classic FizzBuzz problem, printing the required
output for numbers 1 to 100:

R Code: FizzBuzz from 1 to 100

# Loop through numbers 1 to 100


for (i in 1:100) {
if (i %% 3 == 0 && i %% 5 == 0) {
print("fizzbuzz")
} else if (i %% 3 == 0) {
print("fizz")
} else if (i %% 5 == 0) {
print("buzz")
} else {
print(i)
}
}

Explanation:

 i %% 3 == 0 checks if i is divisible by 3.
 i %% 5 == 0 checks if i is divisible by 5.
 "fizzbuzz" is printed when the number is divisible by both 3 and 5.

Sample Output (First few lines):

[1] 1
[1] 2
[1] "fizz"
[1] 4
[1] "buzz"
[1] "fizz"
[1] 7
[1] 8
[1] "fizz"
[1] "buzz"
...

b. Imagine a high school with 1000 lockers all in a row, numbered 1 to 1000 in order. At
the start, all of them are closed. 1000 students are sent, one after the other, to change
the state of a set of lockers (from open to closed or closed to open). The first student
changes the state of all lockers. The second changes the state of every other one (2, 4,
6, 8, . . .). The third changes the state of every third one (3, 6, 9, 12, . . . ). This
process continues until all 1000 students have gone. Write a R program to determine
which lockers are open at the end of this process?

Aim: This is a classic logical puzzle known as the "1000 lockers problem". The key
insight is:A locker will end up open only if it is toggled an odd number of times. And that
only happens for lockers whose numbers are perfect squares (like 1, 4, 9, 16, 25, etc.)
because only perfect squares have an odd number of divisors.

R Program to Determine Which Lockers Are Open


# Total number of lockers
num_lockers <- 1000

# Initialize all lockers as FALSE (closed)


lockers <- rep(FALSE, num_lockers)

# Simulate 1000 students toggling locker states


for (student in 1:num_lockers) {
lockers[seq(student, num_lockers, by = student)] <- !lockers[seq(student,
num_lockers, by = student)]
}

# Find which lockers are open (TRUE)


open_lockers <- which(lockers)

# Print the result


cat("Lockers that remain open:\n")
print(open_lockers)

Explanation:

 lockers is a logical vector; TRUE = open, FALSE = closed.


 Each student toggles every nth locker (seq(student, num_lockers, by = student)).
 At the end, we find indices where lockers are TRUE.

Sample Output:

Lockers that remain open:


[1] 1 4 9 16 25 36 49 64 81 100 121 144 169 196 225 256 289 324 361 400
[21] 441 484 529 576 625 676 729 784 841 900 961

These are perfect squares from 1² to 31² — lockers that are toggled an odd number of
times.

b. Write a function chomp() that, given a string, removes from the string any occurrence
of the character &, as well as the character to the left of each & character. So, for
example, your function should return:

> chomp ( " a&c " ) " c "

> chomp ( " a&" ) " "


chomp ( " abc " ) " abc "

AIM: To write a custom chomp() function in R that behaves exactly as described:

R Function: chomp()

chomp <- function(input) {


chars <- unlist(strsplit(input, split = "")) # Split string into characters
result <- character(0) # Initialize empty vector to build result

for (i in seq_along(chars)) {
if (chars[i] == "&") {
if (length(result) > 0) {
result <- result[-length(result)] # Remove last added character
}
# Do not add the "&" itself
} else {
result <- c(result, chars[i]) # Add current character
}
}

return(paste(result, collapse = ""))


}

Test Cases:

chomp("a&c") # Returns "c"


chomp("a&") # Returns ""
chomp("abc") # Returns "abc"
chomp("ab&&c") # Returns "c"
chomp("a&b&c") # Returns "c"

Sample Output:

> chomp("a&c")
[1] "c"

> chomp("a&")
[1] ""
> chomp("abc")
[1] "abc"

> chomp("ab&&c")
[1] "c"

5.a. Write a function which takes a single argument which is a matrix. The function
should return a matrix which is the same as the function argument but every odd
number is doubled.

AIM: Here's an R function that takes a matrix as input, and returns a new matrix where
every odd number is doubled, while even numbers remain unchanged.

R Function: Double Odd Numbers in a Matrix

double_odds <- function(mat) {


# Create a copy of the matrix to modify
result <- mat

# Find elements that are odd (remainder when divided by 2 is 1)


odd_positions <- result %% 2 == 1

# Double the odd values


result[odd_positions] <- result[odd_positions] * 2

return(result)
}

Example Usage

# Sample matrix
m <- matrix(1:9, nrow = 3, byrow = TRUE)
print("Original matrix:")
print(m)

# Call the function


modified_m <- double_odds(m)
print("Matrix after doubling odd numbers:")
print(modified_m)

Sample Output

Original matrix:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9

Matrix after doubling odd numbers:


[,1] [,2] [,3]
[1,] 2 2 6
[2,] 4 10 6
[3,] 14 8 18

b. Write a function that takes an array of numbers x and returns the smallest
number in the array.

AIM: Here's a simple R function that takes an array (or vector) of numbers x and returns
the smallest number in that array.

R Function: Find the Smallest Number

find_min <- function(x) {


return(min(x))
}

Example Usage

# Sample numeric array


numbers <- c(5, 3, 8, 1, 9)

# Call the function


smallest <- find_min(numbers)

# Print the result


print(paste("The smallest number is:", smallest))

Sample Output
[1] "The smallest number is: 1"

6.a. Compute descriptive statistics for the data given below.


X: 14, 20, 22, 19, 15, 18, 30, 27

Y: 16, 25, 27, 20, 16, 18, 27, 23


AIM: Here's how to compute descriptive statistics (mean, median, mode, variance,
standard deviation, min, max, range, etc.) for the given variables X and Y in R.

R Code to Compute Descriptive Statistics

# Define the data vectors


X <- c(14, 20, 22, 19, 15, 18, 30, 27)
Y <- c(16, 25, 27, 20, 16, 18, 27, 23)

# Load the 'modeest' package for computing mode (if needed)


if (!require(modeest)) install.packages("modeest")
library(modeest)

# Create a function to compute descriptive statistics


compute_stats <- function(data) {
stats <- list(
Mean = mean(data),
Median = median(data),
Mode = mfv(data), # Most frequent value
Variance = var(data),
Std_Deviation = sd(data),
Minimum = min(data),
Maximum = max(data),
Range = range(data),
IQR = IQR(data),
Summary = summary(data)
)
return(stats)
}

# Compute and print statistics for X


cat("Descriptive Statistics for X:\n")
print(compute_stats(X))

# Compute and print statistics for Y


cat("\nDescriptive Statistics for Y:\n")
print(compute_stats(Y))

Sample Output

You will get output like:

Descriptive Statistics for X:


$Mean
[1] 20.625

$Median
[1] 19.5

$Mode
[1] 14 15 18 19 20 22 27 30

$Variance
[1] 30.55357

$Std_Deviation
[1] 5.528456

$Minimum
[1] 14

$Maximum
[1] 30

$Range
[1] 14 30

$IQR
[1] 7.75

Summary
Min. 1st Qu. Median Mean 3rd Qu. Max.
14.00 17.25 19.50 20.62 25.25 30.00

And similarly for Y.

b. Write a R script which will compute the mean and variance of the vector x

<- 1:100. Compare with R’s internal mean() and var() functions.

AIM: Here's a complete R script that:

1. Creates a vector x from 1 to 100,


2. Computes the mean and variance manually,
3. Compares the results with R’s built-in mean() and var() functions.

R Script: Manual vs Built-in Mean and Variance

# Create the vector


x <- 1:100

# Manually compute mean


manual_mean <- sum(x) / length(x)

# Manually compute variance (sample variance, denominator n - 1)


mean_x <- manual_mean
manual_variance <- sum((x - mean_x)^2) / (length(x) - 1)

# Using R's built-in functions


builtin_mean <- mean(x)
builtin_variance <- var(x)

# Print results
cat("Manual Mean: ", manual_mean, "\n")
cat("Built-in Mean: ", builtin_mean, "\n\n")

cat("Manual Variance: ", manual_variance, "\n")


cat("Built-in Variance: ", builtin_variance, "\n")

Expected Output
Manual Mean: 50.5
Built-in Mean: 50.5

Manual Variance: 841.6667


Built-in Variance: 841.6667

Note:

 The manual mean is just the sum divided by count.


 The manual variance uses sample variance (denominator n - 1, not n), same as
var().

7. Write a function to compute running medians. Running medians are a simple


smoothing method usually applied to time-series. For example, for the numbers 7,5, 2,
8, 5, 5, 9, 4, 7, 8, the running medians of length 3 are 5, 5, 5, 5, 5, 5, 7, 7. The first
running median is the median of the three numbers 7, 5, and 2; the second running
median is the median of 5, 2, and 8; and so on. Your function should take two
arguments: the data (say, x), and the number of observations for each median (say,
length).

AIM: Here's an R function to compute the running medians of a given vector x with a
specified window length length. This function uses a sliding window to compute the
median for each window of the specified size.

R Function: Running Medians

# Function to compute running medians


running_median <- function(x, length) {
# Initialize an empty vector to store the running medians
result <- numeric(0)

# Loop through the vector and compute the median for each window
for (i in seq_along(x)) {
# If we've reached the required window size, compute the median
if (i >= length) {
window <- x[(i - length + 1):i]
result <- c(result, median(window))
}
}
return(result)
}

Example Usage

# Example data
x <- c(7, 5, 2, 8, 5, 5, 9, 4, 7, 8)

# Call the function with a window length of 3


result <- running_median(x, 3)

# Print the result


print(result)

Sample Output

[1] 5 5 5 5 5 5 7 7

Explanation:

 First Running Median: Median of [7, 5, 2] is 5.


 Second Running Median: Median of [5, 2, 8] is 5.
 And so on...

8. Write a R program to perform data import/export (.csv, .xlxs) operations using data
frames in R.
AIM: Here's a complete R program that demonstrates how to import and export data
using data frames with both .csv and .xlsx file formats.

R Program for Data Import/Export in R

# Load required package for Excel (.xlsx) operations


if (!require("openxlsx")) install.packages("openxlsx")
library(openxlsx)

# --------------------------------------------
# Step 1: Create a sample data frame
# --------------------------------------------
student_data <- data.frame(
ID = 1:5,
Name = c("Alice", "Bob", "Charlie", "Diana", "Ethan"),
Score = c(85, 90, 78, 92, 88)
)

# --------------------------------------------
# Step 2: Export to CSV
# --------------------------------------------
write.csv(student_data, "student_data.csv", row.names = FALSE)
cat("Data exported to 'student_data.csv'\n")

# --------------------------------------------
# Step 3: Export to XLSX
# --------------------------------------------
write.xlsx(student_data, "student_data.xlsx")
cat("Data exported to 'student_data.xlsx'\n")

# --------------------------------------------
# Step 4: Import from CSV
# --------------------------------------------
csv_data <- read.csv("student_data.csv")
cat("Data imported from 'student_data.csv':\n")
print(csv_data)

# --------------------------------------------
# Step 5: Import from XLSX
# --------------------------------------------
xlsx_data <- read.xlsx("student_data.xlsx")
cat("Data imported from 'student_data.xlsx':\n")
print(xlsx_data)

🔧 Packages Used

 openxlsx: For reading/writing .xlsx Excel files.

Install it once with: install.packages("openxlsx")

Files Created:
 student_data.csv
 student_data.xlsx

These will be created in your working directory. You can check it with:

getwd() # Get current working directory

9. Write an R program to create bell curve of a random normal distribution.

AIM: Here's a complete R program to generate a bell curve (normal distribution curve)
using random data drawn from a normal distribution and plotted with ggplot2 or base
plot() function.

Option 1: Using ggplot2 (Recommended for better visuals)

# Load ggplot2 package


if (!require(ggplot2)) install.packages("ggplot2")
library(ggplot2)

# Generate random data from normal distribution


set.seed(123) # For reproducibility
data <- data.frame(x = rnorm(1000, mean = 50, sd = 10))

# Create bell curve plot


ggplot(data, aes(x)) +
geom_density(fill = "lightblue", color = "darkblue") +
labs(title = "Bell Curve - Normal Distribution",
x = "Value",
y = "Density") +
theme_minimal()

Option 2: Using Base R

# Generate random data


set.seed(123)
x <- rnorm(1000, mean = 50, sd = 10)

# Create histogram with density


hist(x, breaks = 30, probability = TRUE,
main = "Bell Curve - Normal Distribution",
xlab = "Value", col = "lightgray", border = "white")

# Add density line (bell curve)


lines(density(x), col = "blue", lwd = 2)

Note:

 rnorm(n, mean, sd) generates n random values from a normal distribution.


 density() estimates the distribution's shape.
 set.seed() ensures the result is reproducible.

10. Write a R program to design correlation matrix by choosing appropriate dataset.

AIM: Here is a complete R program to create and visualize a correlation matrix using a
built-in dataset. For demonstration, we’ll use the mtcars dataset, which contains numeric
data on fuel consumption and 10 aspects of automobile design and performance.

R Program: Correlation Matrix with Visualization

# Load necessary packages


if (!require(corrplot)) install.packages("corrplot")
library(corrplot)

# Step 1: Load the dataset


data(mtcars)

# Step 2: View the structure of the dataset


print("Structure of mtcars dataset:")
str(mtcars)

# Step 3: Compute correlation matrix


cor_matrix <- cor(mtcars)

# Step 4: Print correlation matrix


cat("Correlation matrix of mtcars dataset:\n")
print(round(cor_matrix, 2))

# Step 5: Visualize the correlation matrix


corrplot(cor_matrix, method = "color", type = "upper",
tl.col = "black", tl.srt = 45,
addCoef.col = "black", # add correlation coefficients
number.cex = 0.7,
title = "Correlation Matrix of mtcars",
mar = c(0,0,1,0))

Explanation

 cor() computes the Pearson correlation between all pairs of numeric columns.
 corrplot() gives a colorful visual of the correlation strengths.
 mtcars is used as an example because all columns are numeric and suitable for
correlation analysis.

Required Package

If corrplot is not installed, R will install it automatically using:

install.packages("corrplot")

Output

You will see:

 A printed correlation matrix in the console.


 A graphical heatmap-style correlation plot.

ooOOoo

You might also like