KEMBAR78
Data Manipulation in R | PDF | Computer Programming | Computing
0% found this document useful (0 votes)
90 views5 pages

Data Manipulation in R

Uploaded by

Tina Parker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views5 pages

Data Manipulation in R

Uploaded by

Tina Parker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Data manipulation in R involves a series of tasks to clean, transform, and analyze data.

Some of the most


common techniques include filtering, sorting, summarizing, and reshaping data. Below are some key
techniques and functions in R for data manipulation, with examples:

1. Loading Required Libraries

To manipulate data in R, you often need libraries like dplyr, tidyr, and data.table. Here’s how to load
them:

# Install and load dplyr for data manipulation

install.packages("dplyr")

library(dplyr)

# Install and load tidyr for reshaping data

install.packages("tidyr")

library(tidyr)

# Install and load data.table for fast manipulation

install.packages("data.table")

library(data.table)

2. Creating Data

You can create data frames in R using data.frame() or tibble() (from the tibble package). Example:

# Example data frame

data <- data.frame(

ID = 1:5,

Name = c("Alice", "Bob", "Charlie", "David", "Eva"),

Age = c(25, 30, 35, 40, 45),

Score = c(85, 90, 88, 95, 89)

# View the data

print(data)

3. Selecting Columns and Rows


You can select specific rows and columns using various techniques:

a. Select Columns

# Select specific columns

data %>%

select(Name, Score)

b. Select Rows

# Filter rows based on conditions

data %>%

filter(Age > 30)

c. Select Both

# Select specific rows and columns

data %>%

filter(Age > 30) %>%

select(Name, Age)

4. Adding/Modifying Columns

You can add or modify columns with mutate():

# Add a new column based on existing ones

data <- data %>%

mutate(Score_Above_90 = ifelse(Score > 90, TRUE, FALSE))

5. Summarizing Data

You can summarize your data using summarize() (or summarise() in British spelling).

# Get the average score by Age

data %>%

group_by(Age) %>%

summarize(Average_Score = mean(Score))

6. Arranging (Sorting) Data

Use arrange() to sort data by one or more variables:

# Sort data by Score in descending order

data %>%
arrange(desc(Score))

7. Reshaping Data

You can reshape data using pivot_longer() and pivot_wider() from tidyr.

a. Pivot Longer

Converting wide format data into long format:

# Example: pivoting data from wide to long format

long_data <- data.frame(

ID = 1:3,

Math = c(90, 85, 80),

Science = c(88, 92, 79)

long_data %>%

pivot_longer(cols = c(Math, Science), names_to = "Subject", values_to = "Score")

b. Pivot Wider

Converting long format data back to wide format:

# Example: pivoting data from long to wide format

long_data %>%

pivot_wider(names_from = "Subject", values_from = "Score")

8. Handling Missing Data

You can handle missing data using na.omit() or mutate() with ifelse().

# Removing rows with missing values

clean_data <- na.omit(data)

# Impute missing values (example: replace NAs with 0)

data$Score[is.na(data$Score)] <- 0

9. Merging Data Frames

To merge data frames, use left_join(), right_join(), or inner_join() from dplyr:

# Example: Merging two data frames by a common column


data1 <- data.frame(ID = 1:3, Name = c("Alice", "Bob", "Charlie"))

data2 <- data.frame(ID = 1:3, Score = c(85, 90, 88))

merged_data <- left_join(data1, data2, by = "ID")

print(merged_data)

10. Data Table for Fast Manipulation

You can use data.table for faster operations, especially on large datasets:

# Convert data frame to data.table

dt <- as.data.table(data)

# Example: Filtering data

dt[Age > 30]

# Example: Summarizing data

dt[, .(Average_Score = mean(Score)), by = Age]

11. Other Useful Functions

 arrange(): Sort data.

 group_by() and summarize(): Group data and calculate summary statistics.

 mutate(): Create new columns or modify existing ones.

 filter(): Subset rows based on conditions.

 spread() and gather() (deprecated, now pivot_wider() and pivot_longer() in tidyr): Reshape data.

Example Workflow:

library(dplyr)

# Example data manipulation pipeline

data %>%

filter(Age > 30) %>%

mutate(Score_Category = ifelse(Score > 90, "High", "Low")) %>%

group_by(Score_Category) %>%
summarize(Average_Age = mean(Age), Average_Score = mean(Score))

Conclusion

These are some basic techniques for manipulating data in R. You can combine these operations to
perform more complex data cleaning and transformation tasks. The dplyr and tidyr packages provide a
powerful, readable, and consistent syntax for these operations.

You might also like