0% found this document useful (0 votes)

25 views23 pages

R Tutorial2

This document is a tutorial on handling metadata using R and CSV files, detailing steps from data retrieval from the World Bank to data cleaning and analysis in R. It includes instructions on importing datasets, removing unnecessary rows and columns, converting data types, and preparing data for analysis. The tutorial concludes with a simple analysis and visualization of the cleaned data.

Uploaded by

cagatayunal00

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views23 pages

R Tutorial2

Uploaded by

cagatayunal00

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

R-Tutorial-2

Çağatay Ünal

2024-10-09
How to Handle Metadata
Step 1

Figure 1: Open Google :)

Step 2

Figure 2: Go to worldbank data

Step 3

Figure 3: Go to World Development Indicators

Step 4

Figure 4: Choose One or Multiple Country/Contries (We use one here)

Step 5

Figure 5: Choose One or Multiple Indicator/Indicators (We use one

indicator here)
Step 6

Figure 6: Choose interested years (we choose last 20 years here)

Step 7

Figure 7: Apply
Step 8

Figure 8: Download as Excel Document or CSV (We prefer CSV here)

Step 8 Addition

In the first stage, using CSV format seems very complicated. You
can basically edit your trivia issues in Excel or Google Docs. But
after using CSV format, every possible document format will be
easier for you.
Step 9 (Going R)

Open a new file. And in the right frame of the R, there is “Import
Dataset”. Click it. Use readr for CSV docs.

Figure 9: Import
Step 10

Take a look import options. Sometimes you need to change

delimiter for TUIK data or something. And also we are going to
change the name of data. Lets make it “data”.

Figure 10: Import Options

Step 11
Lets remove unnecessary rows and columns from data.
library(readr)

data <- read_csv("185cae9e-1880-46c9-ba5b-db7b3428af26_Series - Metadata.csv")

## Warning: One or more parsing issues, call ‘problems()‘ on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)

## Rows: 8 Columns: 24
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (19): Country Name, Country Code, Series Name, Series Code, 2004 [YR2004...
## dbl (5): 2012 [YR2012], 2013 [YR2013], 2014 [YR2014], 2015 [YR2015], 2016 [...
##
## i Use ‘spec()‘ to retrieve the full column specification for this data.
## i Specify the column types or set ‘show_col_types = FALSE‘ to quiet this message.

View(data)

# Now we see the which are they unnecessary.

data <- data[-c(2:9), ]

# Remember that the first syntax before comma is for row, the second is for column.
# And we use "-" minus for removing.

View(data)
Step 12

data <- data[ , -c(1:4)]

View(data)
Step 13

# if you already installed "dplyr" package:

library(dplyr)

##
## Attaching package: ’dplyr’

## The following objects are masked from ’package:stats’:

##
## filter, lag

## The following objects are masked from ’package:base’:

##
## intersect, setdiff, setequal, union

# Otherwise please install with this code: install.package("dplyr")

Step 14

And lets convert all the columns into numeric. Because we can
only do MATH or PLOT with numeric data.
data <- data %>% mutate_all(as.numeric)

## Warning: There were 7 warnings in ‘mutate()‘.

## The first warning was:
## i In argument: ‘2017 [YR2017] = .Primitive("as.double")(‘2017 [YR2017]‘)‘.
## Caused by warning:
## ! NAs introduced by coercion
## i Run ‘dplyr::last_dplyr_warnings()‘ to see the 6 remaining warnings.

View(data)

is.numeric(data$`2004 [YR2004]`)

## [1] TRUE
Step 15

Now we want to remove NA values from our dataset.

data_clean <- data[, colSums(is.na(data)) == 0]

# I know, it seems a bit horrible but lets dive into

# What does it say actually?

# 1- we determined a new data name. Its easy.

# 2- we continue with our main data name which name is data already.
# 3- As we mentioned before; before comma for rows, after comma for columns.
# 4- And we want to remove NA columns. Then continue with after comma.
# 5- colSums is a new thing to learn. It helps us to column process.
# 6- We know the how the is.na works from last week.
# 7- == means basically equals, and its going to be Zero. It says.

View(data_clean)
Step 16

We are very close to anaylsis. For the IDEAL DATA sake, DATA
GODS want to more clearity. Now our data needs one more touch.
Do you have any idea? What do we need to do here? Everything
seems very clear, what type of tiny little touch here we need?
Step 16 Cont.

transposed_data <- t(data_clean)

View(transposed_data)

# Give a name to our first column

colnames(transposed_data)[1] <- "Values"

View(transposed_data)
Step 17

HAPPY ENDING
mean(transposed_data)

## [1] 78332.76
transposed_data

70000 75000 80000 85000 90000 95000 100000

Step 18

2
plot(transposed_data)

4
6

Index
8
10
12
End

R Tutorial3
No ratings yet
R Tutorial3
17 pages
Data Cleaning Using R
No ratings yet
Data Cleaning Using R
26 pages
R Data Cleaning Techniques
No ratings yet
R Data Cleaning Techniques
26 pages
Programming For Data Science Assignment-2
No ratings yet
Programming For Data Science Assignment-2
23 pages
R Cheat Sheets for ECON1267
No ratings yet
R Cheat Sheets for ECON1267
13 pages
Advanced R Guide for Beginners
No ratings yet
Advanced R Guide for Beginners
73 pages
R Studio: Scripts, Data Handling & Cleaning
No ratings yet
R Studio: Scripts, Data Handling & Cleaning
25 pages
Practical Preprocessing and Data Cleaning
No ratings yet
Practical Preprocessing and Data Cleaning
51 pages
Lesson 7 - The Data Frame
No ratings yet
Lesson 7 - The Data Frame
7 pages
Mda Practical2 Eda
No ratings yet
Mda Practical2 Eda
50 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
Data Minig and Techniquezz
No ratings yet
Data Minig and Techniquezz
48 pages
Week6 Slides Updated
No ratings yet
Week6 Slides Updated
57 pages
Data Analysis with R for Beginners
No ratings yet
Data Analysis with R for Beginners
4 pages
CleaningData Chapter 3
No ratings yet
CleaningData Chapter 3
29 pages
Base R
No ratings yet
Base R
9 pages
Data Tidying With Tidyr::: Cheat Sheet
No ratings yet
Data Tidying With Tidyr::: Cheat Sheet
2 pages
R-Basics Knit
No ratings yet
R-Basics Knit
13 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
Important R Codes and Notes
No ratings yet
Important R Codes and Notes
13 pages
R data.table Guide: 50 Examples
No ratings yet
R data.table Guide: 50 Examples
13 pages
Module 1: Unit - 1.1: Introduction To Analytics or R Programming
No ratings yet
Module 1: Unit - 1.1: Introduction To Analytics or R Programming
26 pages
02-Data Gathering and Preparation
No ratings yet
02-Data Gathering and Preparation
54 pages
Week2 DataWrangling DelimitedText PDF
No ratings yet
Week2 DataWrangling DelimitedText PDF
5 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
Data Preparation: Treatment of Missing Values
No ratings yet
Data Preparation: Treatment of Missing Values
26 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
R Topicscovered
No ratings yet
R Topicscovered
22 pages
R Functions
No ratings yet
R Functions
8 pages
R Programming
No ratings yet
R Programming
11 pages
R Programming Materials
No ratings yet
R Programming Materials
51 pages
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
No ratings yet
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
31 pages
Statistics and Data Science With R Part - 4
No ratings yet
Statistics and Data Science With R Part - 4
23 pages
Dar Lecture 7
No ratings yet
Dar Lecture 7
24 pages
R Language PDF
100% (1)
R Language PDF
619 pages
Unit 2
No ratings yet
Unit 2
76 pages
Data Handling and Manipulation
No ratings yet
Data Handling and Manipulation
18 pages
Advanced R Data Analysis Training PDF
No ratings yet
Advanced R Data Analysis Training PDF
72 pages
Data Manipulation in R
No ratings yet
Data Manipulation in R
5 pages
Learn R - Learn R - Data Cleaning Cheatsheet - Codecademy
No ratings yet
Learn R - Learn R - Data Cleaning Cheatsheet - Codecademy
4 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
R Programming: © 2016 SMART Training Resources Pvt. LTD
No ratings yet
R Programming: © 2016 SMART Training Resources Pvt. LTD
28 pages
Week 1-B. Data in R
No ratings yet
Week 1-B. Data in R
5 pages
MIT 302 - Statistical Computing II - Tutorial 02
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 02
5 pages
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
No ratings yet
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
72 pages
MBA Sem 1 Unit 3 Fundamentals of R
No ratings yet
MBA Sem 1 Unit 3 Fundamentals of R
41 pages
ProgrammingForDS14 Rbasics
No ratings yet
ProgrammingForDS14 Rbasics
32 pages
Gries Stefan Thomas (2013) - Statistics For Linguistics With R - 2
No ratings yet
Gries Stefan Thomas (2013) - Statistics For Linguistics With R - 2
100 pages
All Codes
No ratings yet
All Codes
10 pages
EM622 Data Analysis and Visualization Techniques For Decision-Making
No ratings yet
EM622 Data Analysis and Visualization Techniques For Decision-Making
47 pages
R Commands
No ratings yet
R Commands
18 pages
Assignment 2 Tidyr
No ratings yet
Assignment 2 Tidyr
2 pages
Dplyr Grammar for Data Wrangling
No ratings yet
Dplyr Grammar for Data Wrangling
21 pages
Practical 1 EDA
No ratings yet
Practical 1 EDA
14 pages
Cleaning Data in R
No ratings yet
Cleaning Data in R
9 pages
Indentifiers, Keywords, Constants
No ratings yet
Indentifiers, Keywords, Constants
26 pages
Linux Essentials Certification Instructor Approved: Users
No ratings yet
Linux Essentials Certification Instructor Approved: Users
3 pages
DBMS Transactions and Normalization
No ratings yet
DBMS Transactions and Normalization
8 pages
Loading and Extracting HFM
No ratings yet
Loading and Extracting HFM
11 pages
Computer Architecture and Organization MCQS
No ratings yet
Computer Architecture and Organization MCQS
10 pages
Queues in CICS - With Paging Logic
90% (10)
Queues in CICS - With Paging Logic
25 pages
Dynamic Memory Allocation
No ratings yet
Dynamic Memory Allocation
27 pages
SQL Injection Guide for Beginners
No ratings yet
SQL Injection Guide for Beginners
19 pages
Computer Organization Unit Wise Important Questions
55% (11)
Computer Organization Unit Wise Important Questions
2 pages
Configuring SAP For Inbound and Outbound Processing
No ratings yet
Configuring SAP For Inbound and Outbound Processing
29 pages
Veritas Netbackup 6.5 Administration (Fundamentals I)
100% (1)
Veritas Netbackup 6.5 Administration (Fundamentals I)
38 pages
PHP Security for Developers
100% (2)
PHP Security for Developers
89 pages
21cs71 Model Set 1 Paper Solution
No ratings yet
21cs71 Model Set 1 Paper Solution
32 pages
What Is SMPTE ST2110?: Andreas Hildebrand RAVENNA Evangelist
No ratings yet
What Is SMPTE ST2110?: Andreas Hildebrand RAVENNA Evangelist
37 pages
SAP ABAP - Sample Report Program On WRITE, COLOR, HOTSPOT Keywords
No ratings yet
SAP ABAP - Sample Report Program On WRITE, COLOR, HOTSPOT Keywords
37 pages
Class 5 Cyber Olympiad Key
No ratings yet
Class 5 Cyber Olympiad Key
4 pages
3D Marine Geometry: Overview of Processing Methodology
No ratings yet
3D Marine Geometry: Overview of Processing Methodology
24 pages
Direct File
No ratings yet
Direct File
48 pages
PIC18F4550 ADC - PIC Controllers
100% (1)
PIC18F4550 ADC - PIC Controllers
9 pages
MCQ 1stexam
No ratings yet
MCQ 1stexam
5 pages
Class 5 ICT Notes
100% (1)
Class 5 ICT Notes
8 pages
AWR Analysis
No ratings yet
AWR Analysis
20 pages
TCS 112 Study Questions by Premier
No ratings yet
TCS 112 Study Questions by Premier
4 pages
MA5600T GPON Config Guide
No ratings yet
MA5600T GPON Config Guide
7 pages
MPMC Syllabus
100% (1)
MPMC Syllabus
2 pages
Hive Practice July
No ratings yet
Hive Practice July
69 pages
Relativity Admin Manual
No ratings yet
Relativity Admin Manual
357 pages
BRKNMS 3043
No ratings yet
BRKNMS 3043
56 pages
8051 CH10 950217
No ratings yet
8051 CH10 950217
110 pages
Caching With SQL Server Compact and The Microsoft Sync Framework
No ratings yet
Caching With SQL Server Compact and The Microsoft Sync Framework
6 pages

R Tutorial2

Uploaded by

R Tutorial2

Uploaded by

R-Tutorial-2

Figure 1: Open Google :)

Figure 2: Go to worldbank data

Figure 3: Go to World Development Indicators

Figure 4: Choose One or Multiple Country/Contries (We use one here)

Figure 5: Choose One or Multiple Indicator/Indicators (We use one

Figure 6: Choose interested years (we choose last 20 years here)

Figure 8: Download as Excel Document or CSV (We prefer CSV here)

Take a look import options. Sometimes you need to change

Figure 10: Import Options

data <- read_csv("185cae9e-1880-46c9-ba5b-db7b3428af26_Series - Metadata.csv")

# Now we see the which are they unnecessary.

data <- data[-c(2:9), ]

data <- data[ , -c(1:4)]

# if you already installed "dplyr" package:

## The following objects are masked from ’package:stats’:

## The following objects are masked from ’package:base’:

# Otherwise please install with this code: install.package("dplyr")

## Warning: There were 7 warnings in ‘mutate()‘.

Now we want to remove NA values from our dataset.

# I know, it seems a bit horrible but lets dive into

# What does it say actually?

# 1- we determined a new data name. Its easy.

transposed_data <- t(data_clean)

# Give a name to our first column

colnames(transposed_data)[1] <- "Values"

70000 75000 80000 85000 90000 95000 100000

You might also like