0% found this document useful (0 votes)

15 views6 pages

Detailed Explanations For Programming in R. Wrangling Data

The document provides detailed instructions for processing data in R, including setting file paths, reading datasets, adding and merging columns, and handling missing data. It covers various data manipulation techniques such as filtering, sorting, and summarizing data, as well as saving processed data to files. Additionally, it suggests using tools like ChatGPT for troubleshooting and clarifying concepts.

Uploaded by

soloviovalada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views6 pages

Detailed Explanations For Programming in R. Wrangling Data

Uploaded by

soloviovalada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Ex 5 Detailed explanations

2. Setting File Path

R
Копировать код
file_path <-
paste0(dirname(rstudioapi::getSourceEditorContext()$path), "/")

● Dynamically set the working directory to the location of the current R script.
● Ensures that the path will always point to the correct folder containing the script,
regardless of its location.

3. Reading Data
R
Копировать код
BOA <- read.csv(paste0(file_path, "BagsOfApples.csv"), sep=";")
BOO <- read_excel(paste0(file_path, "BagsOfOrangesNA.xlsx"))
Geo <- read_excel(paste0(file_path, "Geo_dim.xlsx"))

● Reads three datasets into R:

○ BOA (Bags of Apples):
■ Reads a semicolon-delimited CSV file using read.csv().
■ Assumes the file is located in the directory set by file_path.
○ BOO (Bags of Oranges):
■ Reads an Excel file using read_excel() from the readxl package.
○ Geo (Geo-dimensions):
■ Likely contains geographic mapping information.

4. Adding Columns
R
Копировать код
BOA$fruits <- "Apples"
BOO$fruits <- "Oranges"

● Adds a new column fruits to each dataframe:

○ For BOA, all rows are labeled as "Apples".
○ For BOO, all rows are labeled as "Oranges".
5. Combining Data (Elaborated)
R
Копировать код
BOF <- rbind(BOO, BOA)

How It Works:

● rbind():
○ Combines two dataframes (BOO and BOA) by stacking rows.
○ Assumes both dataframes have the same column structure (column names
and types).

Additional Use Cases:

1. If Columns Don’t Match Exactly:

○ bind_rows(BOO, BOA, .id = "source") can be used to allow differing
columns. Missing columns will be filled with NA.
2. Adding an Identifier for Data Source:

By using the .id parameter in bind_rows, you can add an extra column indicating the
source of each row:
R
Копировать код
BOF <- bind_rows(BOO = BOO, BOA = BOA, .id = "source")

○ Here, rows from BOO would have source = "BOO", and rows from BOA
would have source = "BOA".

6. Replacing Values
R
Копировать код
BOF$origin <- str_replace_all(BOF$origin, "California", "United
States")

● Uses str_replace_all() from the stringr package to replace "California"

with "United States" in the origin column.

7. Merging Datasets
R
Копировать код
BOF <- left_join(BOF, Geo, by = c("origin" = "Country"))

● Joins BOF with Geo based on the origin column in BOF and Country column in
Geo.

8. Renaming Columns
R
Копировать код
BOF <- rename(BOF, price = prize)

● Renames the prize column to price.

9. Removing Columns
R
Копировать код
BOF <- select(BOF, -bagNo)

● Removes the bagNo column using select().

10. Handling Missing Data

R
Копировать код
anyNA(BOF)
BOF <- na.omit(BOF)

● anyNA(BOF):
○ Checks for the presence of NA (missing values) in the dataframe.
● na.omit(BOF):
○ Removes all rows with any NA values.

11. Filtering Data

R
Копировать код
BOF_europe <- filter(BOF, Region == "Europe")

● Creates a subset of BOF where the Region column equals "Europe".

12. Adding Calculated Columns

R
Копировать код
BOF <- mutate(BOF, ppk = price / weight)

● mutate():
○ Adds a new column ppk (price per kilo), calculated as price divided by
weight.

13. Sorting Data

R
Копировать код
arrange(BOF, desc(ppk))

● arrange():
○ Sorts the dataframe based on the ppk column in descending order.

14. Saving Processed Data (Elaborated)

R
Копировать код
write.table(BOF, file="bagsoffruits_price.txt", sep="\t",
row.names=FALSE)

How It Works:

● write.table():
○ Writes the dataframe BOF to a file named bagsoffruits_price.txt.
○ The file uses tab (\t) as the delimiter.

Additional Use Cases:

1. Changing the Delimiter:

You can change sep to any other delimiter, such as commas for a CSV file:
R
Копировать код
write.table(BOF, file="bagsoffruits_price.csv", sep=",",
row.names=FALSE)

○
2. Adding Row Names:
○ Set row.names=TRUE to include row numbers as a separate column.
3. Saving with Quotes:

To wrap text fields in quotes:

R
Копировать код
write.table(BOF, file="bagsoffruits_price.txt", sep="\t",
row.names=FALSE, quote=TRUE)

○
4. Using write.csv() for Simplicity:

For CSV files, write.csv() can be used as a shortcut:

R
Копировать код
write.csv(BOF, file="bagsoffruits_price.csv", row.names=FALSE)

15. Counting and Grouping Data

R
Копировать код
count(BOF, foodLabel)

● Counts the number of rows for each unique value in the foodLabel column.

R
Копировать код
arrange(count(BOF, foodLabel), desc(n))

● Arranges the counts in descending order by frequency.

16. Summarizing Data
R
Копировать код
BOFg <- group_by(BOF, foodLabel)
BOFgn <- summarise(BOFg, meanppk = mean(ppk))

● group_by():
○ Groups data by the foodLabel column.
● summarise():
○ Calculates the mean ppk for each group.

17. Troubleshooting and Assistance

● The document recommends using tools like ChatGPT or Copilot to troubleshoot

errors or get clarification on concepts.

Programming in R. Ex5-Wrangling
No ratings yet
Programming in R. Ex5-Wrangling
9 pages
Modern Statistics With R
100% (4)
Modern Statistics With R
580 pages
BDA Section 4
No ratings yet
BDA Section 4
19 pages
R File Code
No ratings yet
R File Code
16 pages
Cleaning Data
No ratings yet
Cleaning Data
17 pages
Reshape2 - R - Flexibly Reshape Data - A Reboot of The Reshape Package
No ratings yet
Reshape2 - R - Flexibly Reshape Data - A Reboot of The Reshape Package
14 pages
Barry (2016) - Business Intelligence With R
No ratings yet
Barry (2016) - Business Intelligence With R
301 pages
STATA - Subject Table of Contents
No ratings yet
STATA - Subject Table of Contents
15 pages
Bread
No ratings yet
Bread
13 pages
Finalproj Aml
No ratings yet
Finalproj Aml
69 pages
Data Cleaning Course Notes
No ratings yet
Data Cleaning Course Notes
27 pages
R Programming Basics Guide
No ratings yet
R Programming Basics Guide
5 pages
Practical Assignment-10 Mini Project Nutrition Calculator - Calculate Nutrition For Recipes
No ratings yet
Practical Assignment-10 Mini Project Nutrition Calculator - Calculate Nutrition For Recipes
16 pages
All Codes
No ratings yet
All Codes
10 pages
Afedr Ed03 Online
No ratings yet
Afedr Ed03 Online
190 pages
Cleaning Data in R
No ratings yet
Cleaning Data in R
9 pages
Econometrics Guide Using R
No ratings yet
Econometrics Guide Using R
70 pages
Base R
No ratings yet
Base R
9 pages
Matrix, Dataframes, List
No ratings yet
Matrix, Dataframes, List
8 pages
Businessintelligencewithr PDF
100% (1)
Businessintelligencewithr PDF
301 pages
Pandas Powerful
100% (1)
Pandas Powerful
100 pages
Lean Pub Data Wrangling With R
No ratings yet
Lean Pub Data Wrangling With R
245 pages
Load and Save Data in R Tutorial
No ratings yet
Load and Save Data in R Tutorial
11 pages
Pandas for Data Analysts
100% (1)
Pandas for Data Analysts
64 pages
Analyzing Financial and Economic Data With R
100% (1)
Analyzing Financial and Economic Data With R
304 pages
Data Analysis Exercises for Beginners
No ratings yet
Data Analysis Exercises for Beginners
43 pages
R Cookbook 1st Edition Teetor PDF Version
No ratings yet
R Cookbook 1st Edition Teetor PDF Version
75 pages
Important R Codes and Notes
No ratings yet
Important R Codes and Notes
13 pages
R/Rpad Reference Card: Slicing and Extracting Data
No ratings yet
R/Rpad Reference Card: Slicing and Extracting Data
5 pages
R Data Manipulation Guide
No ratings yet
R Data Manipulation Guide
46 pages
#Practical:5 #Viewing Objects Within Objects
No ratings yet
#Practical:5 #Viewing Objects Within Objects
3 pages
Criando Data Frame
No ratings yet
Criando Data Frame
6 pages
Data Manipulation R
No ratings yet
Data Manipulation R
13 pages
DSCI 100 Cheat Sheet
No ratings yet
DSCI 100 Cheat Sheet
3 pages
StatComp Script
No ratings yet
StatComp Script
174 pages
Zomoto Data Analysis Using Python - 1
No ratings yet
Zomoto Data Analysis Using Python - 1
10 pages
Tidyr & Dplyr Functions Guide
No ratings yet
Tidyr & Dplyr Functions Guide
3 pages
R Data Import/Export
No ratings yet
R Data Import/Export
38 pages
R Reference Card
100% (4)
R Reference Card
4 pages
Advanced R Guide for Beginners
No ratings yet
Advanced R Guide for Beginners
73 pages
Basic R Programming
No ratings yet
Basic R Programming
37 pages
CRC Data Science
No ratings yet
CRC Data Science
443 pages
Pandasguide
No ratings yet
Pandasguide
65 pages
R Master Sheet - All Codes, Inbuilt Functions and Packages Needed For The Course
No ratings yet
R Master Sheet - All Codes, Inbuilt Functions and Packages Needed For The Course
2 pages
R For Data Analysis
No ratings yet
R For Data Analysis
216 pages
R Programming Cont..
No ratings yet
R Programming Cont..
24 pages
Pandas and Numpy Data Processing Guide
No ratings yet
Pandas and Numpy Data Processing Guide
66 pages
Pandas Guide
No ratings yet
Pandas Guide
65 pages
Rcourse PDF
No ratings yet
Rcourse PDF
237 pages
R Guru Cheat Sheet
No ratings yet
R Guru Cheat Sheet
2 pages
Content: Dplyr, Readr, TM, Ggplot2/+ggforce/, Tidyr, Broom Dplyr
No ratings yet
Content: Dplyr, Readr, TM, Ggplot2/+ggforce/, Tidyr, Broom Dplyr
8 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Pandas Data Processing Guide
No ratings yet
Pandas Data Processing Guide
65 pages
R Basics Continued - Factors and Data Frames - Intro To R and RStudio For Genomics
No ratings yet
R Basics Continued - Factors and Data Frames - Intro To R and RStudio For Genomics
17 pages
Exploratory Data Analysis With R
No ratings yet
Exploratory Data Analysis With R
218 pages
Python Module-4
No ratings yet
Python Module-4
109 pages
Card-Based Tools For Creative and Systematic Design
100% (1)
Card-Based Tools For Creative and Systematic Design
14 pages
Product Data Sheet: Motion Servo Drive - Lexium 23 - Single Phase 200... 255 V - 1.5 KW - I/O
No ratings yet
Product Data Sheet: Motion Servo Drive - Lexium 23 - Single Phase 200... 255 V - 1.5 KW - I/O
9 pages
Chapter Three: Output Primitives
No ratings yet
Chapter Three: Output Primitives
37 pages
Loan Approval Prediction
No ratings yet
Loan Approval Prediction
23 pages
Log
No ratings yet
Log
30 pages
FRS of LTE For PoC With Covering Letter
No ratings yet
FRS of LTE For PoC With Covering Letter
54 pages
Mayuri
No ratings yet
Mayuri
71 pages
Problem A: Python File: Time Limit: 1 Second
No ratings yet
Problem A: Python File: Time Limit: 1 Second
15 pages
InBody370 Manual
No ratings yet
InBody370 Manual
95 pages
E-Farming: Agri E-Commerce Platform
50% (2)
E-Farming: Agri E-Commerce Platform
15 pages
Military Applications of Internet of Things: Operational Concerns Explored in Context of A Prototype Wearable
No ratings yet
Military Applications of Internet of Things: Operational Concerns Explored in Context of A Prototype Wearable
12 pages
April 5, 2024 - Ticket - Cebu To Ormoc
No ratings yet
April 5, 2024 - Ticket - Cebu To Ormoc
1 page
A I. Technical Specifications: Please Note
No ratings yet
A I. Technical Specifications: Please Note
11 pages
Notes On Types of SSD
No ratings yet
Notes On Types of SSD
4 pages
Strategic Decisions For Multisided Platforms
No ratings yet
Strategic Decisions For Multisided Platforms
20 pages
Final-2023 Modular System Tech Guide
No ratings yet
Final-2023 Modular System Tech Guide
160 pages
Risc & Sisc Characteristics
No ratings yet
Risc & Sisc Characteristics
9 pages
Deepi Pro
No ratings yet
Deepi Pro
63 pages
4aa4 9872enw
No ratings yet
4aa4 9872enw
9 pages
Datasheet - How USM Anywhere Delivers Optimal Threat Detection With Fewer Rules
No ratings yet
Datasheet - How USM Anywhere Delivers Optimal Threat Detection With Fewer Rules
2 pages
Leaflet CED1700 - 00 Released Ireland (English) High-Res A4.fm
No ratings yet
Leaflet CED1700 - 00 Released Ireland (English) High-Res A4.fm
3 pages
Online Banking Security Measures and Data Protection Advances in Information Security Privacy and Ethics 1st Edition Shadi A. Aljawarneh
100% (5)
Online Banking Security Measures and Data Protection Advances in Information Security Privacy and Ethics 1st Edition Shadi A. Aljawarneh
55 pages
Lesson 3 - Patterns Ans Number Sequences
No ratings yet
Lesson 3 - Patterns Ans Number Sequences
2 pages
Introduction of Blu-Ray Disc
No ratings yet
Introduction of Blu-Ray Disc
17 pages
Training Course 5 Transformations
No ratings yet
Training Course 5 Transformations
32 pages
Mail Merge
No ratings yet
Mail Merge
3 pages
Man 8035 Ord Hand
No ratings yet
Man 8035 Ord Hand
1 page
Absence and Calculation Card Details
No ratings yet
Absence and Calculation Card Details
2 pages
Liz: I Am Afraid That I Am Putting On Weight. - Tony
No ratings yet
Liz: I Am Afraid That I Am Putting On Weight. - Tony
6 pages

Detailed Explanations For Programming in R. Wrangling Data

Uploaded by

Detailed Explanations For Programming in R. Wrangling Data

Uploaded by

Ex 5 Detailed explanations

2. Setting File Path

● Reads three datasets into R:

● Adds a new column fruits to each dataframe:

Additional Use Cases:

1. If Columns Don’t Match Exactly:

● Uses str_replace_all() from the stringr package to replace "California"

● Renames the prize column to price.

● Removes the bagNo column using select().

10. Handling Missing Data

11. Filtering Data

● Creates a subset of BOF where the Region column equals "Europe".

12. Adding Calculated Columns

13. Sorting Data

14. Saving Processed Data (Elaborated)

Additional Use Cases:

To wrap text fields in quotes:

For CSV files, write.csv() can be used as a shortcut:

15. Counting and Grouping Data

● Arranges the counts in descending order by frequency.

17. Troubleshooting and Assistance

● The document recommends using tools like ChatGPT or Copilot to troubleshoot

You might also like