R-Tutorial-2
Çağatay Ünal
2024-10-09
How to Handle Metadata
Step 1
Figure 1: Open Google :)
Step 2
Figure 2: Go to worldbank data
Step 3
Figure 3: Go to World Development Indicators
Step 4
Figure 4: Choose One or Multiple Country/Contries (We use one here)
Step 5
Figure 5: Choose One or Multiple Indicator/Indicators (We use one
indicator here)
Step 6
Figure 6: Choose interested years (we choose last 20 years here)
Step 7
Figure 7: Apply
Step 8
Figure 8: Download as Excel Document or CSV (We prefer CSV here)
Step 8 Addition
In the first stage, using CSV format seems very complicated. You
can basically edit your trivia issues in Excel or Google Docs. But
after using CSV format, every possible document format will be
easier for you.
Step 9 (Going R)
Open a new file. And in the right frame of the R, there is “Import
Dataset”. Click it. Use readr for CSV docs.
Figure 9: Import
Step 10
Take a look import options. Sometimes you need to change
delimiter for TUIK data or something. And also we are going to
change the name of data. Lets make it “data”.
Figure 10: Import Options
Step 11
Lets remove unnecessary rows and columns from data.
library(readr)
data <- read_csv("185cae9e-1880-46c9-ba5b-db7b3428af26_Series - Metadata.csv")
## Warning: One or more parsing issues, call ‘problems()‘ on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## Rows: 8 Columns: 24
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (19): Country Name, Country Code, Series Name, Series Code, 2004 [YR2004...
## dbl (5): 2012 [YR2012], 2013 [YR2013], 2014 [YR2014], 2015 [YR2015], 2016 [...
##
## i Use ‘spec()‘ to retrieve the full column specification for this data.
## i Specify the column types or set ‘show_col_types = FALSE‘ to quiet this message.
View(data)
# Now we see the which are they unnecessary.
data <- data[-c(2:9), ]
# Remember that the first syntax before comma is for row, the second is for column.
# And we use "-" minus for removing.
View(data)
Step 12
data <- data[ , -c(1:4)]
View(data)
Step 13
# if you already installed "dplyr" package:
library(dplyr)
##
## Attaching package: ’dplyr’
## The following objects are masked from ’package:stats’:
##
## filter, lag
## The following objects are masked from ’package:base’:
##
## intersect, setdiff, setequal, union
# Otherwise please install with this code: install.package("dplyr")
Step 14
And lets convert all the columns into numeric. Because we can
only do MATH or PLOT with numeric data.
data <- data %>% mutate_all(as.numeric)
## Warning: There were 7 warnings in ‘mutate()‘.
## The first warning was:
## i In argument: ‘2017 [YR2017] = .Primitive("as.double")(‘2017 [YR2017]‘)‘.
## Caused by warning:
## ! NAs introduced by coercion
## i Run ‘dplyr::last_dplyr_warnings()‘ to see the 6 remaining warnings.
View(data)
is.numeric(data$`2004 [YR2004]`)
## [1] TRUE
Step 15
Now we want to remove NA values from our dataset.
data_clean <- data[, colSums(is.na(data)) == 0]
# I know, it seems a bit horrible but lets dive into
# What does it say actually?
# 1- we determined a new data name. Its easy.
# 2- we continue with our main data name which name is data already.
# 3- As we mentioned before; before comma for rows, after comma for columns.
# 4- And we want to remove NA columns. Then continue with after comma.
# 5- colSums is a new thing to learn. It helps us to column process.
# 6- We know the how the is.na works from last week.
# 7- == means basically equals, and its going to be Zero. It says.
View(data_clean)
Step 16
We are very close to anaylsis. For the IDEAL DATA sake, DATA
GODS want to more clearity. Now our data needs one more touch.
Do you have any idea? What do we need to do here? Everything
seems very clear, what type of tiny little touch here we need?
Step 16 Cont.
transposed_data <- t(data_clean)
View(transposed_data)
# Give a name to our first column
colnames(transposed_data)[1] <- "Values"
View(transposed_data)
Step 17
HAPPY ENDING
mean(transposed_data)
## [1] 78332.76
transposed_data
70000 75000 80000 85000 90000 95000 100000
Step 18
2
plot(transposed_data)
4
6
Index
8
10
12
End