0 ratings 0% found this document useful (0 votes) 15 views 45 pages DataScience Unit 2
These study notes for MCA students cover the fundamentals of R programming, including data types, objects, and data input/output operations, as well as control structures, functions, scoping rules, and handling dates and times. R is highlighted as a powerful tool for statistical computing and data visualization, with an emphasis on its open-source nature and extensive package ecosystem. The notes aim to provide a comprehensive understanding of R's capabilities in data science, addressing both basic and advanced topics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save DataScience Unit 2 For Later Unit 2: Data Science with R - Study Notes for MCA
Students
These notes cover the basics of R programming, data types, objects, and data input/output
operations for a student pursuing an MCA course in Data Science. The content is structured to
meet the objectives of understanding R programming fundamentals, exploring data analysis
principles, and addressing emerging issues in data science.
1. R Programming Basics: Overview of R
Definition
Risa free, open-source programming language and environment designed for statistical
computing, data analysis, and graphical visualization. It is widely used in data science for tasks like
data manipulation, statistical modeling, and creating visualizations.
Key Features of R
* Statistical Analysis: Built-in functions for statistical tests and models.
+ Data Visualization: Libraries like ggplot2 for high-quality graphs.
+ Open Source: Free to use with a large community for support.
Extensibility: Thousands of packages available via CRAN (Comprehensive R Archive Network).
+ Cross-Platform: Runs on Windows, macOS, and Linux.Why R for Data Science?
+ Handles large datasets efficiently.
+ Supports reproducible research with scripts.
* Integrates with other tools like Python, SQL, and Hadoop.
Getting Started with R
1. Installation:
+ Download R from CRAN.
* Install RStudio, a user-friendly IDE for R, from RStudio's website.
2. R Environment:
+R Console: For executing commands.
+ R Scripts: For writing reusable code.
R Markdown: For creating reports with code and output.
Basic Syntax
+ Ris case-sensitive (myVar * NyVar).
+ Use <- or = for assignment.
* Comments start with #.R s+ @ Copy
x <+ 10
y = 20
print(x + y) # Output: 30
R Packages
Packages extend R’s functionality. Install and load them using:
R © Copy
# Install a package
install. packages ("ggplot2")
# Load a package
library (ggplot2)2. R Data Types and Objects
Definition
Data types define the kind of data stored in R, while objects are structures that hold data, such as
vectors, matrices, or data frames.
Basic Data Types
R supports the following primary data types:
1. Numeric: Real numbers (integers or decimals).
* Example: 5, 3.14
2. Integer: Whole numbers (explicitly defined with L)
+ Example: 10
3. Character: Text or strings.
+ Example: "Hello", 'R'
4. Logical: Boolean values (TRUE or FALSE ).
+ Example: TRUE, FALSE
5. Complex: Numbers with real and imaginary parts.
* Example: 3 + 2iChecking Data Type:
Use typeof() or class() to check the type.
R O Copy
x <- 3.14
typeof(x) # Output: "double"
class(x) # Output: "numeric"
R Objects
R organizes data into objects. The most common objects are:
1. Vector:
+ Aone-dimensional collection of elements of the same data type.
+ Created using ¢() (combine function).
+ Syntax: vector_name <- c(elementi, element2,
* Example:
R + @ Copy
num_vec <- c(1, 2, 3, 4)
print(num_vec) # Output: 12 3 4
char_vec <- c("Apple", "Banana", "Orange")
print(char_vec) # Output: "Apple" "Banana" "Orange"2. Matrix:
A two-dimensional array with rows and columns, containing elements of the same data type.
Created using matrix() .
Syntax: matrix(data, nrow, ncol)
Example:
R “+ GQ) Copy
# Create a 2x3 matrix
mat <- matrix(c(1, 2, 3, 4, 5, 6), now = 2, ncol = 3)
print (mat)
# Output:
# 1] (,2] [.3]
eC eee eee
O(a) 8 1G
3. Array:
+ Amulti-dimensional extension of a matrix.
* Created using array() .
+ Syntax: array(data, dim)
+ Example:
R “+ Gl Copy
arr <- array(c(1:12), dim = ¢(2, 3, 2))
print (arr)4. Data Frame:
* A table-like structure where columns can have different data types.
+ Created using data.frame() .
+ Syntax: data.frame(column1 = values, column2 = values, ...)
+ Example:
© Copy
4 e a dat
df <- data. frame(
Name = ¢("Alice", "Bob", "Cathy"),
Age = c(20, 22, 21),
Score = ¢(85.5, 90.0, 88.5)
)
print (d£)
5. List:
+ Acollection of objects that can have different data types and structures.
* Created using list() .
+ Syntax: list (element1, element2, ...)
+ Example:R + @ Copy
# Create a list
my_list <- list(name = "Alice", age = 20, scores = c(85, 90, 88))
print(my_list)
# [1] “Alice”
# Sage
# [1] 20
# $scores
# [1] 85 90 88
Operations on Objects
* Accessing Elements:
* Vectors: Use [] (e.g., num_vec[2] )
* Matrices: Use [row, col] (e.g., mat[1, 2] ).
* Data Frames: Use $ or [] (e.g., df$Name, df[1, 1).
* Lists: Use $ or [[]] (€.g., my_list$name, my_list[(2]]).
+ Modifying Elements:
+ Assign new values using <-.
+ Example: num_vec[1] <- 103. Reading and Wri
ing Data
Definition
Reading data involves importing datasets into R from external sources (e.g., CSV, Excel, databases),
while writing data involves exporting R objects to files.
Common File Formats
* CSV: Comma-separated values, widely used for tabular data.
+ Excel: Spreadsheet files (requires readxl or openxlsx package).
+ Text: Plain text files.
+ JSON/XML: Structured data formats (requires packages like jsonlite or XML).
Reading Data
1. Reading CSV Files:
Use read.csv() or read.table() for CSV files
Syntax: read.csv("file_path", header = TRUE, sep =
Example
R ++ GQ) Copy
# Read a CSV file
data <- read.csv("students.csv", header = TRUE)
head(data) # Display first 6 rows2. Reading Excel Files:
+ Use readxl package.
+ Syntax: read_excel("file path", sheet = 1)
+ Example:
R s+ @ Copy
# Install and load readx1
install. packages("readx1")
Library (readx1)
# Read an Excel file
data <- read_excel("students.xlsx", sheet = 1)
head (data)
3. Reading Text Files:
+ Use read.table() or readLines() .
+ Example:
O Copy
# Read a text file
text_data <- read.table("data.txt", header = TRUE)
print (text_data)Writing Data
1. Writing to CSV Files:
* Use write.csv() or write.table() .
* Syntax: write.csv(data, "file path", row.names = FALSE)
+ Example:
# Write data frame to CSV
write.csv(df, “output.csv", row.names = FALSE)
2. Writing to Excel Files:
+ Use openxisx package.
+ Example:
# Install and load openxlsx
instal. packages("openxisx")
Library (openx1sx)
# Write data frame to Excel
write.xlsx(df, “output.xlsx")
O Copy
O Copy3. Writing to Text Files:
* Use write.table() or writeLines() .
+ Exampl
R “+ @ Copy
# Write data to text file
write.table(df, "output.txt", row.names = FALSE)
Handling Missing Data
+ Missing values in R are represented by NA.
+ Check for missing values: is.na(data)
+ Remove rows with missing values: na.omit (data)
+ Example:
R “+B Copy
# Check for missing values
data <- data.frame(
Name = ¢("Alice", "Bob", NA),
Age = c(20, NA, 21)
)
print(is.na(data))
# Output:
# Name Age
# (1,] FALSE FALSE
# [2,] FALSE TRUE
# (3,] TRUE FALSE a© Copy
HR x ith NA
clean_data <- na.omit(data)
print (clean_data)
Emerging Issues in Data Science
nsure datasets comply with regulations like GDPR.
Handling large datasets requires optimized reading/writing (e.g., using data. table
package).
* Data Quality: Missing or inconsistent data can affect analysis.
* Reproducibility: Use scripts and version control for consistent results.
Summary
+R Programming Ba!
s: R is a powerful tool for statistical computing and visualization, with an
easy-to-learn syntax for data science tasks.
+ Data Types and Objects: R supports numeric, character, logical, and other data types, organized
into vectors, matrices, arrays, data frames, and lists.
* Reading and Writing Data: R provides functions like read.csv() , write.csv() , and packages
like xeadx1 for handling various file formats.Data Science with R - Unit 2 Part 2 Study Notes
These notes cover Control Structures, Functions, Scoping Rules, and Dates and Times in R
programming for MCA students studying Data Science. The content is designed to be simple,
comprehensive, and self-sufficient, aligning with the objectives of understanding data science
principles, exploring data analysis, and learning R basics.
1. Control Structures
Definition
Control structures in R allow you to control the flow of execution of a program. They help make
decisions, repeat tasks, or skip certain operations based on conditions.
Types of Control Structures
1. Conditional Statements (if, else, ifelse)
2. Loops (for, while, repeat)
3. Other Utilities (break , next )
1.1 Conditional Statements
Definition
Conditional statements execute code based on whether a condition is TRUE or FALSE.
Syntaxif Statement:
R + OQ) Copy
if (condition) {
# Code to execute if condition is TRUE
if-else Statement:
R @) Copy
if (condition) {
# Code to execute if condition is TRUE
else £
# Code to execute if condition is FALSE
ifelse Function (vectorized):
R - @ Copy
ifelse(test, yes, no)Examples
1. if Statement:
x <- 10
if (x >5) f
print("x is greater than 5")
3
# Output: [1] "x is greater than 5"
2. if-else Statement:
x<- 3
if (x > 5) f
print("x is greater than 5")
} else f
print("x is less than or equal to 5")
t
# Output: [1] "x is less than or equal to 5"
Copy
© Copy3. ifelse Function:
@ Copy
x <- c(1, 6, 3, 8)
result <- ifelse(x > 5, "Big", "Small")
print(result)
# Output: [1] "Small" "Big" "Small" "Big"
1.2 Loops
Definition
Loops allow you to repeat a block of code multiple times.
Types of Loops
1. for Loop: Iterates over a sequence.
2. while Loop: Repeats as long as a condition is TRUE.
3. repeat Loop: Repeats indefinitely until a break statement is encountered.
Syntax
+ for Loop:
R O Copy
for (variable in
# Code t* while Loop:
R s+ @) Copy
while (condition) {
# Code to execute
+ repeat Loop:
O Copy
repeat {
if (condition) break
Examples
1. for Loop:
O Copy
for (i in 1:5) {
print (i)
#
# Output: [1] 1
# (2] 2
# (1) 3
# (1) 4
# (1) 52. while Loop:
ica
while (i <= 5) f
print (i)
i<-itd
$
# Output: Same as for loop
3. repeat Loop:
isa
repeat {
print (i)
icie¢d
if (i > 5) break
r
# Output: Same as for loop
1.3 Other Utilities
break
Stops a loop immediately.
O Copy
©) CopyR + @) Copy
for (i in 1:10) {
if (i == 4) break
print (i)
#
# Output: [1] 1
# (2] 2
# (2) 3
next
Skips the current iteration and moves to the next.
R s+ @) Copy
for (i in 1:5) {
if (i == 3) next
print (i)
t
# Output: [1] 2
# (1] 2
# (1] 4
# (41 52. Functions
Defini
n
Functions are reusable blocks of code that perform a specific task. They take inputs (arguments),
process them, and return an output.
Why Use Functions?
+ Improve code reusability.
+ Make code modular and easier to maintain.
* Reduce repetition.
Syntax
@ Copy
function_name <- function(argi, arg2, ...) {
## Code to execute
xeturn(value) # Optional
$
* function_name: Name of the function.
* arg1, arg2: Arguments (inputs).
* return(value): Specifies the output (optional; last evaluated expression is returned by default).Examples
1.
ple Function:
square <- function(x) {
return(x * x)
+
result <- square(5)
print (result)
# Output: [1] 25
Function with Multiple Arguments:
add_numbers <- function(a, b) {
sum <- a+b
return(sum)
$
xesult <- add_numbers(3, 7)
print (result)
# Output: [1] 10
X Collapse
Wrap
O Copy
O Copy3. Default Arguments:
R + @ Copy
greet <- function(name = "Guest") {
paste("Hello,", name)
$
print(greet()) # Output: [1] "Hello, Guest"
print(greet("Alice")) # Output: [1] "Hello, Alice"
4, Returning Multiple Values (using a list):
R s+ @ Copy
stats <- function(x) {
return(list(mean = mean(x), sum = sum(x)))
t
result <- stats(c(1, 2, 3, 4, 5))
print (result)
# Output: $mean
# (4 3
# $sum
# [2] 15Anonymous Functions
Functions without a name, often used in apply-like functions.
R © Copy
lapply (1:3, function(x) x*2)
# Output: [(1]]: 2
+ ((2]]: 4
+ (I3]]: 9
3. Scoping Rules
Definition
Scoping rules determine how R looks up the value of a variable. R uses lexical scoping, meaning the
value of a variable is searched in the environment where the function was defined, not where it is
called.
Key Concepts
Environments: A collection of variable-value pairs,
Global Environment: Where variables defined outside functions reside.
Local Environment: Created when a function is called, destroyed afterward.
Rena
Parent Environment: The environment in which a function was defined.How Scoping Works
* R first looks for a variable in the current environment.
* Ifnot found, it searches the parent environment, and so on, up to the global environment.
* If still not found, it checks the base environment and packages.
Examples
1. Global vs Local Variables:
x <- 10 # Global variable
my_function <- function() {
x <- 5 # Local variable
print (x)
3
my_function() # Output: [1] 5
print (x) # Output: [1] 10
2. Accessing Global Variable:
R X Collapse
x <- 10
my function <- function() {
print(x) # Uses global x
$
my_function() # Output: [1] 10
= Wrap
© Copy
O Copy3. Lexical Scoping:
R © Copy
make_counter <- function() {
count <- 0
function() {
count <<- count + 1 # <<- modifies variable in parent environment
return(count)
t
t
counter <- make_counter()
print(counter()) # Output: [1] 1
print(counter()) # Output: [1] 2
<<- Operator
+ Used to assign a value to a variable in a parent environment.
* Useful in closures (functions that retain state).
4. Dates and Times
Definition
R provides tools to handle dates and times for data analysis, such as calculating time differences,
formatting dates, or extracting components (day, month, year).Key Classes
1 itores dates (e.g., "2025-05-08").
2. POSIXet:
tores date-time with seconds precision (e.g., "2025-05-08 14:30:00")
3. POSIXIt: Stores date-time as a list of components (day, month, year, etc.).
Key Packages
* Base R: Functions like as.Date() , Sys.time() .
+ lubridate: Simplifies date-time operations (install using install. packages("lubridate") ).
4.1 Working with Dates
Creating Dates
+ Use as.Date() to convert strings to Date objects.
* Default format: "YyvY-MM-0D" .
Syntax
R =O) Copy
as.Date("YYYY-MM-DD")Examples
1. Creating a Date:
R ++ @) Copy
my_date <- as.Date("2025-05-08")
print (my_date)
# Output: [1] "2025-05-08"
2. Custom Format:
a © Copy
my_date <- as.Date("08/05/2025", format = "“%d/%m/%Y")
print (my_date)
3. Current Date:
a © Copy
today <- Sys.Date()
print (today)
# Output: [1] "2025-05-98" (assuming current date)4.2 Working with Date-Times
Creating Date-Times
+ Use as.POSIXct() or as.POSIX1t() for date-time objects.
+ Default format: "YYYY-NM-DD HH:MM:Ss" .
Syntax
R +O) Copy
as.POSIXct("YYYY-MM-DD HH:MM:SS")
Examples
1, Creating a Date-Time:
R “+O Copy
my_datetime <- as.POSIXct("2025-@5-08 14:30:00")
print (my_datetime)
# Output: [1] "2025-05-08 14:30:00 UTC"
2. Current Date-Time:
R “Copy
now <- Sys.time()
print (now)
# Output: [1] "2025-05-08 14:30:00 UTC" (example)4.3 Extracting Components
Use functions like weekdays() , months() , or lubridate functions.
Examples
1. Base R:
R “+ Gl Copy
my_date <- as.Date("2025-05-08")
print (weekdays(my_date)) # Output: [1] "Thursday"
print(months(my_date)) # Output: [1] "May"
2. Using lubridate:
R @) Copy
library (lubridate)
my_date <- ymd("2025-05-08")
print (year(my_date)) # Output: [1] 2025
print(month(my_date)) # Output: [1] 5
print(day(my_date)) # Output: [1] 8
4.4 Date Arithmetic
Perform calculations like adding days or finding differences.
Examples
4. Adding Days: vmy_date <- as.Date("2025-05-08")
new_date <- my_date + 7
print (new_date)
# Output: [1] "2025-05-15"
. Time Difference:
datel <- as.Date("2025-95-08")
date2 <- as.Date("2025-96-08")
diff <- date2 - datel
print (diff)
# Output: Time difference of 31 days
. Using lubridate:
Library (Lubridate)
my_date <- ymd("2025-05-08")
new_date <- my_date + days(7)
print (new_date)
# Output: [1] "2025-05-15"
© Copy
© Copy
O Copy4.5 Formatting Dates
Use format () to display dates in desired formats.
Example
R “ @ Copy
my_date <- as.Date("2025-05-08")
formatted <- format(my_date, "%d-%b-%Y")
print (formatted)
# Output: [1] "08-May-2025"
Key Takeaways
* Control Structures: Use if, for, while , etc., to control program flow. Example: Check if a
number is positive or negative.
+ Functions: Create reusable code blocks with function() . Example: Calculate the square of a
number.
* Scoping Rules: Understand lexical scoping and environments. Example: Use <<- to modify
variables in parent environments.
+ Dates and Times: Handle dates with as.Date() , date-times with as.POSIXct() , and simplify
tasks with lubridate . Example: Calculate days between two dates.Practice Questions
Write a function to check if a number is even or odd using if-else
Create a for loop to print squares of numbers from 1 to 10.
Use a closure to create a function that tracks how many times it’s called.
RNB
Calculate the number of days between today and your birthday using as.Date() .Data Science Using R - Unit 2 Part 2 Study Notes
These notes cover Loop Functions and Debugging Tools in R, tailored for MCA students learning
Data Science using R. The content is designed to be simple, comprehensive, and self-sufficient,
aligning with the objectives of understanding data science principles, exploring data analysis, and
learning R programming basics.
1. Loop Functions inR
Definition
Loop functions in R are specialized functions that simplify repetitive tasks by applying operations
over data structures (like vectors, lists, or matrices) without writing explicit loops. They are efficient,
reduce code complexity, and align with R’s functional programming style.
Why Use Loop Functions?
* Avoid writing repetitive for or while loops
+ Improve code readability and performance.
+ Handle large datasets effectively in data science tasks.
Common Loop Functions
R provides several loop functions, including lapply, sapply, apply, tapply, and mapply . Below,
each is explained with definitions, syntax, and examples.1. lapply
+ Definition: Applies a function to each element of a list or vector and returns a list.
+ Syntax: lapply(X, FUN, ...)
* X:Alist or vector.
* FUN: Function to apply.
+... : Additional arguments for FUN .
+ Example:
R s+ @ Copy
# Calculate square of numbers in a list
numbers <- list(1, 2, 3, 4)
squares <- lapply(numbers, function(x) x42)
print(squares)
# Output: [[1]] [1] 1
# ((21] [1] 4
# ((31] [1] 9
# ([4]] [1] 16
1.2 sapply
* Definition: Similar to lapply , but simplifies the output to a vector or matrix if possible.
+ Syntax: sapply(X, FUN, ..., simplify = TRUE)
+ simplify: If TRUE, simplifies output; if FALSE , returns a list.+ Example:
# Calculate square roots of numbers
numbers <- c(4, 9, 16)
roots <- sapply(numbers, sqrt)
print(roots)
# Output: [1] 234
1.3 apply
* Definition
© Syntax: apply(X, MARGIN, FUN,
© X: Matrix or array.
* MARGIN : 1 for rows, 2 for columns.
+ Example:
mat <- matrix(1:6, nrow
2)
row_sums <- apply(mat, 1, sum)
print (row_sums)
Q Youare offline
pplies a function over the margins (rows or columns) of a matrix or array.
© Copy
© Copy1.4 tapply
* Definition: Applies a function to subsets of a vector, defined by a factor.
* Syntax: tapply(X, INDEX, FUN, ...)
* X: Vector.
* INDEX : Factor or list of factors to group x.
+ Example:
R “+ @ Copy
scores <- ¢(85, 90, 78, 92, 88)
groups <- c("A", "B", "A", "B", "A")
group_means <- tapply(scores, groups, mean)
print (group_means)
# Output A B
1.5 mapply
* Definition: Applies a function to multiple lists or vectors element-wise.
+ Syntax: mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE)
* ...: Multiple lists or vectors.
* MoxeArgs : Additional arguments for FUN .+ Example:
R s+) Copy
vecl <- ¢(1, 2, 3)
vec2 <- ¢(4, 5, 6)
sums <- mapply(sum, vec, vec2)
print (sums)
Key Points
+ Loop functions are vectorized, making them faster than traditional loops.
+ Choose the appropriate function based on input (list, matrix, vector) and desired output (list,
vector, etc.).
* Use anonymous functions ( function(x) ) for simple operations within loop functions.
Practical Example in Data Science
Suppose you have a dataset of sales across regions and want to calculate average sales per region:
R + @ Copy
sales <- c(100, 159, 200, 120, 180)
regions <- c("North", "South", "North", "South", "North")
avg_sales <- tapply(sales, regions, mean)
print (avg_sales)
# Output: North South
# 160 1352. Debugging Tools in R
Definition
Debugging tools in R help identify and fix errors (bugs) in code, ensuring programs run correctly. In
data science, debugging is crucial for ensuring data analysis scripts produce accurate results.
Why Debugging Matters?
* Errors in code can lead to incorrect data analysis.
+ Debugging tools save time by pinpointing issues quickly.
+ They help understand how code executes, improving learning.
Common Debugging Tools
R provides several built-in tools for debugging, including browser() , debug() , trace() , and
error-handling functions like try() and tryCatch() . Each is explained below with syntax and
examples.
2.1 browser()
* Definition: Pauses code execution and allows interactive inspection of variables and code flow.
+ Syntax: Insert browser() in the code where you want to pause.
+ Example:R s+ @) Copy
my_function <- function(x) {
ye xe?
browser() # Pai
z<-y +10
return(z)
$
my_function(5)
+ When executed, R pauses at browser() , letting you inspect x and y. Type n to proceed or
¢ tocontinue.
2.2 debug()
* Definition: Enables step-by-step execution of a function.
* Syntax: debug(function_name)
+ Example:
R s+ GQ) Copy
my_function <- function(x) £
y < xA2
zeyts
return(z)
$
debug(my_function)
my_function(3)+ Renters debug mode, allowing you to step through each line. Use n (next), c (continue), or
Q (quit).
2.3 trace()
* Definition: Modifies a function to print information when it’s called, useful for tracking function
execution.
* Syntax: trace(function_name, tracer)
+ tracer : Specifies what to print (e.g., print() )
+ Example:
R “+ @) Copy
my_function <- function(x)
ye x*3
return(y)
i]
trace(my function, quote(print(x)))
my_function(4)
* Output shows the input x each time my_function is called.
* Use untrace(my_function) to stop tracing.
2.4 try()
* Definition: Attempts to run code and prevents it from stopping due to errors.
* Syntax: try(expr, silent = FALSE)
* expr: Code to execute.
* silent : If TRUE, suppresses error messages.+ Example:
O Copy
result <- try(log(-1), silent = TRUE)
if (inherits(result, “try-error")) {
print("Error: Invalid input")
} else i
print (result)
t
# Output: [1] "Error: Invalid input"
2.5 tryCatch()
* Definition: Provides advanced error handling by specifying actions for errors, warnings, or
messages.
* Syntax:
© Copy
tryCatch(expr, error = function(e) {}, warning = function(w) {3, finally = {})
+ expr: Code to execute.
+ error, warning : Functions to handle errors or warnings.
+ finally : Code to run regardless of success or failure.
+ Example:R s+ @ Copy
result <- tryCatch({
tog(-1)
3, error = function(e) {
return("Error: Cannot compute log of negative number")
»)
print (result)
# Output: [1] "Error: Cannot compute log of negative number"
Debugging Workflow
Identify the Error: Run the code and note any error messages.
Use print() or cat() : Add these to check variable values at different points.
Use browser() or debug() : Step through code to find where it fails.
Handle Errors: Use try() or tryCatch() for robust scripts.
ArRoONo
Test Fixes: Run the corrected code to ensure it works.
Practical Example in Data Science
Suppose you're analyzing a dataset and encounter an error in a function calculating averages:
R + @ Copy
calculate_avg <- function(data) {
browser() # Inspect data
result <- mean(data)
return(result)data <- c(10, 20, NA, 30) Pee ere ere
calculate_avg(data) # Exror due to NA
+ Use browser() to check data.
+ Fix by adding na.zm = TRUE:
Copy
calculate_avg <- function(data) {
result <- mean(data, na.rm = TRUE)
return (result)
g
print(calculate_avg(data)) # Output: 20
Common Debugging Tips
* Check for NA or missing values in data.
* Ensure correct data types (e.g., numeric vs. character).
+ Use str() to inspect object structures.
+ Test small parts of code before running the entire script.
Connection to Data Science Objectives
Emerging Issues: Loop functions handle large datasets efficiently, addressing scalability in data
science. Debugging ensures reliable analysis, critical for real-world applications.
+ Underlying Principles: Loop functions demonstrate functional programming, a key concept in
data analysis. Debugging tools teach error handling, ensuring robust data pipelines.+ R Programming Basics: Mastery of loop functions and debugging builds a strong foundation for
writing efficient, error-free R code.
Summary
+ Loop Functions: Use lapply, sapply, apply, tapply, and mapply to simplify repetitive tasks.
They're efficient for data manipulation in data science.
+ Debugging Tools: Use browser() , debug(), trace(), try() ,and tryCatch() to find and fix
errors, ensuring accurate data analysis.
+ Practice: Apply these tools to datasets, such as calculating summaries or handling errors in real-
world data.
These notes provide a complete guide for Unit 2 Part 2, enabling you to understand and apply loop
functions and debugging tools in R for data science tasks.