11/14/23, 2:56 PM Functions and Packages
Functions and Packages
AUTHOR
Dr. Mohammad Nasir Abdullah
1. Introduction to Function
A function is a self-contained block of code that encapsulates a specific task or related group of tasks.
Functions take some inputs, perform their task, and then send an output. R comes with numerous built-
in functions, and it also allows you to create your own, known as user-defined functions.
What is function?
A function in R is a piece of code written to carry out a specified task. It takes some input, processes it,
and returns a result. Functions help in reducing redundancy, making the code more readable, and
debugging easier.
Types of Functions:
1. Built-in Functions: Pre-defined functions that are included in R.
– Example: sum() , mean() , print()
2. User-Defined Functions: Functions created by the user for a specific task.
3. Anonymous Functions: Functions without a name, used for short tasks.
Why use functions?
Modularity: Break down complex tasks into smaller, manageable parts.
Re-usability: Write once, use many times.
Clarity: Make your code more understandable and easier to maintain.
Anatomy of a function
A typical function in R has the following components:
• Name: Identifies the function and is used to call it.
• Parameters: Inputs that are passed into the function.
• Body: The code block that performs the task.
• Return Value: The output of the function.
Creating a simple function
Creating a function in R involves specifying the function name, parameters, the operations to be
performed, and the return value. Below, we delve deeper into creating simple functions, providing
various examples and explaining each step in detail.
https://sta334.s3.ap-southeast-1.amazonaw s.com/FunctionandPackage/Function+and+Packages+-+Lecture.html 1/7
11/14/23, 2:56 PM Functions and Packages
Here is the basic syntax for creating a function in R:
function_name <- function(parameters) {
# Body of the function
return(result)
}
• function_name: The name used to call the function.
• parameters: The variables that are input to the function.
• result: The output of the function.
Example 1: Adding two numbers
add_numbers <- function(a, b) {
sum <- a + b
return(sum)
}
#to use the function
add_numbers(2,6)
[1] 8
Example 2: Calculating mean, median and mode
#create a data
data <- c(2,3,4,5,6,7,8,9,10,11, 11, 11, 12,11)
#calculate mean
mean_data <- mean(data)
cat("mean of the data: ", mean_data, "\n")
mean of the data: 7.857143
#calculate median
median_data <- median(data)
cat("median of the data: ", median_data, "\n")
median of the data: 8.5
#calculate mode
get_mode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
mode_data <- get_mode(data)
cat("mode of the data: ", mode_data, "\n")
mode of the data: 11
https://sta334.s3.ap-southeast-1.amazonaw s.com/FunctionandPackage/Function+and+Packages+-+Lecture.html 2/7
11/14/23, 2:56 PM Functions and Packages
Example 3: calculating mean, median, and mode for mtcars dataset
#function to calculate mode
get_mode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
#creating main function to calculate mean, median and mode for all numerical variables
stat_mtcars <- function(dataset) {
#filter out non-numeric columns
numeric_data <- dataset[sapply(dataset, is.numeric)]
#Apply functions to each column
means <- sapply(numeric_data, mean, na.rm=T)
medians <- sapply(numeric_data, median, na.rm=T)
modes <- sapply(numeric_data, get_mode )
#combine results into a list
results <- data.frame(
mean = means,
median = medians,
mode = modes
)
return(results)
#use the function
stat_mtcars(mtcars)
mean median mode
mpg 20.090625 19.200 21.00
cyl 6.187500 6.000 8.00
disp 230.721875 196.300 275.80
hp 146.687500 123.000 110.00
drat 3.596563 3.695 3.92
wt 3.217250 3.325 3.44
qsec 17.848750 17.710 17.02
vs 0.437500 0.000 0.00
am 0.406250 0.000 0.00
gear 3.687500 4.000 3.00
carb 2.812500 2.000 4.00
Example 4: Convert numeric data to factor variables
convert_factor <- function(dataset, column_names) {
specified_columns <- names(dataset)
for(col in column_names){
dataset[[col]] <- as.factor(dataset[[col]])
}
https://sta334.s3.ap-southeast-1.amazonaw s.com/FunctionandPackage/Function+and+Packages+-+Lecture.html 3/7
11/14/23, 2:56 PM Functions and Packages
return(dataset)
}
#test the function
mtcars<- convert_factor(mtcars, c("cyl","vs", "am", "gear", "carb"))
str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
$ am : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
$ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
$ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
Example 5: Calculating mean, variance, standard deviation, median, and IQR
stat_cal <- function(dataset){
#filter out the non-numerical variables
numeric_data <- dataset[sapply(dataset, is.numeric)]
#Apply function to each column
means <- sapply(numeric_data, mean, na.rm=T)
variance <- sapply(numeric_data, var, na.rm=T)
sds <- sapply(numeric_data, sd, na.rm=T)
medians <- sapply(numeric_data, median, na.rm=T)
iqrs <- sapply(numeric_data, IQR, na.rm=T)
#create a data.frame from results
result <- data.frame(
Means = means,
Variances = variance,
StandardDeviation = sds,
Medians = medians,
IQRs = iqrs
)
return(result)
}
#test the function
stat_cal(mtcars)
Means Variances StandardDeviation Medians IQRs
mpg 20.090625 3.632410e+01 6.0269481 19.200 7.37500
https://sta334.s3.ap-southeast-1.amazonaw s.com/FunctionandPackage/Function+and+Packages+-+Lecture.html 4/7
11/14/23, 2:56 PM Functions and Packages
disp 230.721875 1.536080e+04 123.9386938 196.300 205.17500
hp 146.687500 4.700867e+03 68.5628685 123.000 83.50000
drat 3.596563 2.858814e-01 0.5346787 3.695 0.84000
wt 3.217250 9.573790e-01 0.9784574 3.325 1.02875
qsec 17.848750 3.193166e+00 1.7869432 17.710 2.00750
Example 6: Detect Missing Values
detect_missing_values <- function(dataset) {
missing_counts <- numeric(ncol(dataset))
names(missing_counts) <- colnames(dataset)
for(i in 1:ncol(dataset)){
missing_counts[i] <- sum(is.na(dataset[[i]]))
}
#Filter out columns with no missing values
missing_counts <- missing_counts[missing_counts>0]
return(missing_counts)
#to test this function
sample_data <- data.frame(
A = c(1,2,NA, 4,5),
B = c(NA, 2,3,4,5),
C = c(1,2,3,4,5)
)
detect_missing_values(sample_data)
A B
1 1
detect_missing_values(mtcars)
named numeric(0)
#let create missing value for mtcars dataset
data1 <- mtcars
data1[c(1,2,3,4), c(5,4,3,2)] <- NA
detect_missing_values(data1)
cyl disp hp drat
4 4 4 4
Introduction to Packages
https://sta334.s3.ap-southeast-1.amazonaw s.com/FunctionandPackage/Function+and+Packages+-+Lecture.html 5/7
11/14/23, 2:56 PM Functions and Packages
A package in R is a collection of functions, sample data, and documentation bundled together. By
using packages, you can leverage the work of others to perform complex tasks with just a few lines of
code.
Why Use Packages?
• Enhanced Functionality: Packages provide additional functions to perform a wide variety of tasks.
• Efficiency: Save time and effort by using pre-written and tested code.
• Community Support: Benefit from the extensive and vibrant R community.
Installing packages
You can install packages directly from CRAN (Comprehensive R Archive Network), or other
repositories, and also from local files.
#installing the 'dplyr' package from CRAN
install.packages("dplyr")
Loading packages
After installing a package, you need to load it into the R environment to use its functions.
#loading the 'dplyr' package
library(dplyr)
Using package functions
After loading a package, you can use its functions by calling them like any other function in R.
#using the 'filter' function from 'dplyr' to filter rows in a data frame.
dplyr::filter(mtcars, mpg > 20)
To see list of functions available in a package
ls(getNamespace("dplyr"))
To see the documentation of the package
help(package="dplyr")
Exercise
1. Basic Statistics:
a. Load the iris dataset. Compute the mean, median and standard deviation for the
Sepal.Length and Sepal.Width columns.
https://sta334.s3.ap-southeast-1.amazonaw s.com/FunctionandPackage/Function+and+Packages+-+Lecture.html 6/7
11/14/23, 2:56 PM Functions and Packages
b. Using the mtcars dataset, determine which car model has the highest miles per gallon ( mpg ).
2. Data manipulation:
a. From the mtcars dataset, filter only those rows where the number of cylinders ( cyl ) is 4.
b. Using the iris dataset, group the data by species and compute the average Sepal.Length for
each group.
3. Custom Functions:
a. Write a function that takes a dataframes and a column name as input and returns the range
(min to max) of that column.
b. Develop a function that accepts a dataframe and returns a list of columns that have missing
values along with the count of missing values.
4. Data Cleaning:
a. Identity and replace any negative values in the Sepal.Length column of the iris dataset with
the mean value of the column.
b. Using any dataset of your choice with missing values, impute the missing values using the
median of the respective columns.
https://sta334.s3.ap-southeast-1.amazonaw s.com/FunctionandPackage/Function+and+Packages+-+Lecture.html 7/7