KEMBAR78
R Tutorial2 | PDF | Comma Separated Values | Information Technology
0% found this document useful (0 votes)
25 views23 pages

R Tutorial2

This document is a tutorial on handling metadata using R and CSV files, detailing steps from data retrieval from the World Bank to data cleaning and analysis in R. It includes instructions on importing datasets, removing unnecessary rows and columns, converting data types, and preparing data for analysis. The tutorial concludes with a simple analysis and visualization of the cleaned data.

Uploaded by

cagatayunal00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views23 pages

R Tutorial2

This document is a tutorial on handling metadata using R and CSV files, detailing steps from data retrieval from the World Bank to data cleaning and analysis in R. It includes instructions on importing datasets, removing unnecessary rows and columns, converting data types, and preparing data for analysis. The tutorial concludes with a simple analysis and visualization of the cleaned data.

Uploaded by

cagatayunal00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

R-Tutorial-2

Çağatay Ünal

2024-10-09
How to Handle Metadata
Step 1

Figure 1: Open Google :)


Step 2

Figure 2: Go to worldbank data


Step 3

Figure 3: Go to World Development Indicators


Step 4

Figure 4: Choose One or Multiple Country/Contries (We use one here)


Step 5

Figure 5: Choose One or Multiple Indicator/Indicators (We use one


indicator here)
Step 6

Figure 6: Choose interested years (we choose last 20 years here)


Step 7

Figure 7: Apply
Step 8

Figure 8: Download as Excel Document or CSV (We prefer CSV here)


Step 8 Addition

In the first stage, using CSV format seems very complicated. You
can basically edit your trivia issues in Excel or Google Docs. But
after using CSV format, every possible document format will be
easier for you.
Step 9 (Going R)

Open a new file. And in the right frame of the R, there is “Import
Dataset”. Click it. Use readr for CSV docs.

Figure 9: Import
Step 10

Take a look import options. Sometimes you need to change


delimiter for TUIK data or something. And also we are going to
change the name of data. Lets make it “data”.

Figure 10: Import Options


Step 11
Lets remove unnecessary rows and columns from data.
library(readr)

data <- read_csv("185cae9e-1880-46c9-ba5b-db7b3428af26_Series - Metadata.csv")

## Warning: One or more parsing issues, call ‘problems()‘ on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)

## Rows: 8 Columns: 24
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (19): Country Name, Country Code, Series Name, Series Code, 2004 [YR2004...
## dbl (5): 2012 [YR2012], 2013 [YR2013], 2014 [YR2014], 2015 [YR2015], 2016 [...
##
## i Use ‘spec()‘ to retrieve the full column specification for this data.
## i Specify the column types or set ‘show_col_types = FALSE‘ to quiet this message.

View(data)

# Now we see the which are they unnecessary.

data <- data[-c(2:9), ]

# Remember that the first syntax before comma is for row, the second is for column.
# And we use "-" minus for removing.

View(data)
Step 12

data <- data[ , -c(1:4)]

View(data)
Step 13

# if you already installed "dplyr" package:

library(dplyr)

##
## Attaching package: ’dplyr’

## The following objects are masked from ’package:stats’:


##
## filter, lag

## The following objects are masked from ’package:base’:


##
## intersect, setdiff, setequal, union

# Otherwise please install with this code: install.package("dplyr")


Step 14

And lets convert all the columns into numeric. Because we can
only do MATH or PLOT with numeric data.
data <- data %>% mutate_all(as.numeric)

## Warning: There were 7 warnings in ‘mutate()‘.


## The first warning was:
## i In argument: ‘2017 [YR2017] = .Primitive("as.double")(‘2017 [YR2017]‘)‘.
## Caused by warning:
## ! NAs introduced by coercion
## i Run ‘dplyr::last_dplyr_warnings()‘ to see the 6 remaining warnings.

View(data)

is.numeric(data$`2004 [YR2004]`)

## [1] TRUE
Step 15

Now we want to remove NA values from our dataset.


data_clean <- data[, colSums(is.na(data)) == 0]

# I know, it seems a bit horrible but lets dive into

# What does it say actually?

# 1- we determined a new data name. Its easy.


# 2- we continue with our main data name which name is data already.
# 3- As we mentioned before; before comma for rows, after comma for columns.
# 4- And we want to remove NA columns. Then continue with after comma.
# 5- colSums is a new thing to learn. It helps us to column process.
# 6- We know the how the is.na works from last week.
# 7- == means basically equals, and its going to be Zero. It says.

View(data_clean)
Step 16

We are very close to anaylsis. For the IDEAL DATA sake, DATA
GODS want to more clearity. Now our data needs one more touch.
Do you have any idea? What do we need to do here? Everything
seems very clear, what type of tiny little touch here we need?
Step 16 Cont.

transposed_data <- t(data_clean)

View(transposed_data)

# Give a name to our first column

colnames(transposed_data)[1] <- "Values"

View(transposed_data)
Step 17

HAPPY ENDING
mean(transposed_data)

## [1] 78332.76
transposed_data

70000 75000 80000 85000 90000 95000 100000


Step 18

2
plot(transposed_data)

4
6

Index
8
10
12
End

You might also like