KEMBAR78
R programming groundup-basic-section-i | PDF
R-Programming–Basics
R Programming
Ground Up!
Syed Awase Khirni
Syed Awase earned his PhD from University of Zurich in GIS, supported by EU V Framework Scholarship from SPIRIT
Project (www.geo-spirit.org). He currently provides consulting services through his startup www.territorialprescience.com
and www.sycliq.com
1Copyright 2008-2016 Syed Awase Khirni TPRI
R-Programming–Basics
R Project
• R – Free Software
environment for
statistical computing
and graphics.
• https://www.r-
project.org
• https://cran.r-
project.org/mirrors.html
Copyright 2008-2016 Syed Awase Khirni TPRI 2
R-Programming–Basics
S
• S Language – Developed by
John Chambers et. al at Bell
Labs
• 1976 -> internal statistical
analysis environment –
originally implemented as
Fortran Libraries
• 1988-> Rewritten in C –
statistical models in S by
Chambers and Hastie
• 1998-> S v.4.0
• 1991-> R created in New
Zealand by Ross Ihaka and
Robert Gentleman.
• 1993 -> public release of R
• 1995-> Martin Machler
convinced Ross and Robert to
use the GNU GPU License
• 1996 , 1997 -> R Core Group
Formed with (S Plus Core
Group)
• 2000- R Version 1.0 Released
• 2015 R Version 3.1.3 -> March
9, 2015.
Copyright 2008-2016 Syed Awase Khirni TPRI 3
R-Programming–Basics
Design of the R System
• R –Statistical Programming
language based on S language
developed by Bell Labs.
• Divided into 2 conceptual parts
– Base
– Add-on Packages
• Base – R System contains
– The base package which is required
to run R and contains the most
fundamental functions.
– Other packages contained in the
base system include utils, stats,
datasets, graphics, grDevices, grid,
methods, tools, parallel, compiler,
splines, tcltk, stats4
• Add-on Packages are packages
that are published by either R
Core group or any third party
vendors
• Syntax similar to S, making it easy
for S-PLUS users to switch over
• Semantics are superficially similar
to S, but in reality are quite
different
• Runs on almost any standard
computing platform/OS
Copyright 2008-2016 Syed Awase Khirni TPRI 4
R-Programming–Basics
R?
• R is an integrated suite of
software facilities for data
manipulation, calculation
and graphical display
• R has
– Effective data handling and
storage facility
– A suite of operators for
calculations on arrays and
matrices
– A large, coherent,
integrated collection of
tools for data analysis
– Graphical facilities for data
analysis and display
– A well developed, simple
and effective programming
language
Copyright 2008-2016 Syed Awase Khirni TPRI 5
R-Programming–Basics
R- Drawbacks
• Little built-in support
for dynamic or 3-D
graphics
• Functionality is based
on consumer demand
and user contributions
• Web support provided
through third party
software.
Copyright 2008-2016 Syed Awase Khirni TPRI 6
R-Programming–Basics
DATA TYPES AND BASIC
OPERATIONS IN R
Copyright 2008-2016 Syed Awase Khirni TPRI 7
R-Programming–Basics
Data Types
• Objects
• Numbers
• Attributes
• Entering Input and Printing
• Vectors, Lists
• Factors
• Missing Values
• Data Frames
• Names
Copyright 2008-2016 Syed Awase Khirni TPRI 8
R-Programming–Basics
Objects in R
• R has five basic or atomic classes of objects
– Character
– Numeric (real number)
– Integer
– Complex
– Logical (true/false)
• The most basic object is a vector
– A vector can only contain objects of the same class
– The one exception is a list, which is represented as a
vector but can contain objects of different classes
– Empty vectors can be created with the vector() function
Copyright 2008-2016 Syed Awase Khirni TPRI 9
R-Programming–Basics
R Studio
Copyright 2008-2016 Syed Awase Khirni TPRI 10
R-Programming–Basics
Install.packages()
• To install additional
third party packages
into your R software.
We use
• Install.packages(“XLCon
nect”)
– To install XLConnect
package
– To activate an already
installed package we use
• Library(“packagename”)
Copyright 2008-2016 Syed Awase Khirni TPRI 11
Check if the package is already installed
or not.
any(grepl("<name of your package>",
installed.packages()))
R-Programming–Basics
Numbers in R
• Treated as numeric
objects (i.e. double
precision real numbers)
• Suffix L => integer
• Example : 1 => numeric
object
– 1L => explicitly gives an
integer
• 1/0 => inf (infinity)
• NaN => not a number or
missing value
Copyright 2008-2016 Syed Awase Khirni TPRI 12
R-Programming–Basics
Attributes
• R objects can have
attributes
– Names, dimnames
– Dimensions (e.g. matrices,
arrays)
– Class
– Length
– Other user-defined
attributes/metadata
• Attributes of an object
can be accessed using the
attributes() function.
Copyright 2008-2016 Syed Awase Khirni TPRI 13
R-Programming–Basics
Assignment Operator (<-)
• Expressions in R are done
using <- assignment
operator.
• The grammar of the
language determines
whether an expression is
complete or not
• The # character indicates a
comment. Anything to the
right of the # (including the
# itself) is ignored
• [1] indicates that x is a
vector and 123781213412
is the first element
Copyright 2008-2016 Syed Awase Khirni TPRI 14
//auto printing
Ctrl+L to clear console
R-Programming–Basics
Vectors in R
• The c() function can be
used to create vectors
of objects.
Copyright 2008-2016 Syed Awase Khirni TPRI 15
R-Programming–Basics
Vectors in R
• Using the vector()
function
Copyright 2008-2016 Syed Awase Khirni TPRI 16
R-Programming–Basics
Mixing Objects
• When different objects are mixed in a vector, coercion
occurs so that every element in the vector is of the
same class.
Copyright 2008-2016 Syed Awase Khirni TPRI 17
R-Programming–Basics
Explicit Coercion
• Objects can be explicitly
coerced from one class
to another using the
as.* functions.
Copyright 2008-2016 Syed Awase Khirni TPRI 18
R-Programming–Basics
Matrices
• Vectors with a dimension
attribute are called Matrices.
The dimension attribute is
itself an integer vector of
length 2(nrow, ncol)
• Matrices are constructed
column-wise, so entries can be
thought of starting from the
upper left corner and running
down the columns.
• Matrices can also be created
directly from vectors by
adding a dimension attribute.
Copyright 2008-2016 Syed Awase Khirni TPRI 19
R-Programming–Basics
Cbind-ing
• Matrices can be created
by Column-binding with
cbind() function
Copyright 2008-2016 Syed Awase Khirni TPRI 20
R-Programming–Basics
Rbind-ing
• Matrices can be created
by row-binding using
rbind() function.
Copyright 2008-2016 Syed Awase Khirni TPRI 21
R-Programming–Basics
Lists in R
• Lists are a special type
of vector that can
contain elements of
different classes.
• Lists are a very
important data type in
R
Copyright 2008-2016 Syed Awase Khirni TPRI 22
R-Programming–Basics
Factors
• Used to represent
categorical data. Factors can
be unordered or ordered.
• Factors are treated
specially by modelling
functions like lm() and
glm()
• Using factors with labels is
better than using integers
because factors are self-
describing, having a
variable that has values.
Copyright 2008-2016 Syed Awase Khirni TPRI 23
R-Programming–Basics
Missing Values
• Many existing, industrial
and research datasets
contain Missing values.
• These can occur due to
various reasons such as
manual data entry
procedures, equipment
errors and incorrect
measurements.
• Missing values can appear
in the form of outliers or
even wrong data (i.e out
of boundaries)
Copyright 2008-2016 Syed Awase Khirni TPRI 24
• Missing values are denoted by NA
or NaN for undefined
mathematical operations
– Is.na() is used to test objects
if they are NA
– Is.nan() is used to test for
NaN
– NA values have a class also,
so there are integerNA,
characterNA etc.
– A NaN value is also NA but
the converse is not true.
R-Programming–Basics
Missing Values
• Three type of problems
are usually associated
with missing values
– Loss of efficiency
– Complications in
handling and
analyzing the data
– Bias resulting from
differences between
missing and complete
data.
Copyright 2008-2016 Syed Awase Khirni TPRI 25
Identifying NA values using is.na() and is.nan()
R-Programming–Basics
Data Frames
• Used to store tabular data
(table of values)
– They are represented as a
special type of list, where
every element of the list has
to have the same length.
– Each element of the list can
be thought of as a column
and the length of each
element of the list is the
number of the rows
• Data frames can store
different classes of objects
in each column, while
matrices must have every
element of the same class
• Data frames also have a
special attribute called
row.names.
• Data frames are usually
created by calling
read.table() or read.csv()
• Can be converted to a
matrix by calling
data.matrix() method
Copyright 2008-2016 Syed Awase Khirni TPRI 26
R-Programming–Basics
Data Frames
Copyright 2008-2016 Syed Awase Khirni TPRI 27
R-Programming–Basics
Data Frame in R
Copyright 2008-2016 Syed Awase Khirni TPRI 28
R-Programming–Basics
Names in R
• R Objects can also have
names, which is very
useful for writing
readable code and self-
describing objects
Copyright 2008-2016 Syed Awase Khirni TPRI 29
R-Programming–Basics
Subsetting
• Extracting subsets from
an existing dataset is
called subsetting
– []Always returns an
object of the same class
as the original
– [[]]Used to extract
elements of a list or a
data frame.
– $ is used to extract
element of a list or data
frame by name;
semantics are similar to
that of [[]].
Copyright 2008-2016 Syed Awase Khirni TPRI 30
R-Programming–Basics
Subsetting Matrix
Copyright 2008-2016 Syed Awase Khirni TPRI 31
R-Programming–Basics
Subsetting List
Copyright 2008-2016 Syed Awase Khirni TPRI 32
R-Programming–Basics
Subsetting Nested Elements
Copyright 2008-2016 Syed Awase Khirni TPRI 33
R-Programming–Basics
Partial Matching
• Partial matching of
names is allowed with
[[]] and $
Copyright 2008-2016 Syed Awase Khirni TPRI 34
R-Programming–Basics
Remove NA values
• A common task is to
remove missing value
(NAs) prior to
performing any analysis.
Copyright 2008-2016 Syed Awase Khirni TPRI 35
R-Programming–Basics
Vectorized Operations
• Many operations in R
are vectorized making
code more efficient,
concise and easier to
read.
Copyright 2008-2016 Syed Awase Khirni TPRI 36
R-Programming–Basics
Vectorized Matrix Operations
Copyright 2008-2016 Syed Awase Khirni TPRI 37
R-Programming–Basics
Reading Data
• R provides some useful functions to read data
– Read.table, read.csv for reading tabular data
– readLines, for reading lines of a text file
– Source: for reading in R code files (inverse of
dump)
– dget: for reading in R code files (inverse of dput)
– Load: for reading in saved workspaces
– Unserialize, for reading single R objects in binary
form.
Copyright 2008-2016 Syed Awase Khirni TPRI 38
R-Programming–Basics
Writing Data
• R provides a set of functions to write data into
files
– Write.table: to write data in table format
– writeLines: to write lines
– Dump
– Dput
– Save
– serialize
Copyright 2008-2016 Syed Awase Khirni TPRI 39
R-Programming–Basics
Reading data files with read.table
• For small to moderately
sized datasets, we can
just call read.table
without specifying any
other arguments.
• Data <-
read.table(“sampledata.
txt”)
Copyright 2008-2016 Syed Awase Khirni TPRI 40
R-Programming–Basics
R-DataSets
• https://vincentarelbundock.github.io/Rdatasets/
datasets.html
• http://openflights.org/data.html
• http://www.public.iastate.edu/~hofmann/data_i
n_r_sortable.html
• https://r-dir.com/reference/datasets.html
• http://fimi.ua.ac.be/data/
• https://datamarket.com/data/list/?q=provider:ts
dl
• https://www.data.gov/
Copyright 2008-2016 Syed Awase Khirni TPRI 41
R-Programming–Basics
Directory/get working directory
• Setting and getting the current working directory
Copyright 2008-2016 Syed Awase Khirni TPRI 42
> setwd("<path to your folder>")
R-Programming–Basics
Reading CSV files
Copyright 2008-2016 Syed Awase Khirni TPRI 43
R-Programming–Basics
Airmile data
Copyright 2008-2016 Syed Awase Khirni TPRI 44
R-Programming–Basics
Mocking sample data with mockaroo
Copyright 2008-2016 Syed Awase Khirni TPRI 45
https://www.mockaroo.com/
R-Programming–Basics
Reading large datasets with read.table
Copyright 2008-2016 Syed Awase Khirni TPRI 46
R-Programming–Basics
Write.csv()
• One of the easiest ways to save an R data
frame is to write it to a csv file or tsv file or
text file.
Copyright 2008-2016 Syed Awase Khirni TPRI 47
R-Programming–Basics
dput()
• Writes an ASCII text representation of an R
object to a file or connection, or uses one to
recreate the object
Copyright 2008-2016 Syed Awase Khirni TPRI 48
R-Programming–Basics
Head and Tail of DataSet
• Returns the first or the
last part of an object ,
i.e. vector, matrix, table,
data frame or function.
Copyright 2008-2016 Syed Awase Khirni TPRI 49
R-Programming–Basics
Loading “foreign” data
• Sometimes, we would
like to import data from
other statistical
packages like SAS,SPSS
and Stata
• Reading stata (.dta)
files with foreign library
• Writing data files from R
into Stata is also very
straightforward.
Copyright 2008-2016 Syed Awase Khirni TPRI 50
R-Programming–Basics
Library”foreign”data
• SPSS Data
– For data files in SPSS
format, it can be opened
with the function
read.spss from “foreign”
package.
– “to.data.frame” option
set to TRUE to return a
data frame.
Copyright 2008-2016 Syed Awase Khirni TPRI 51
R-Programming–Basics
Loading “foreign”data
• Excel data
– Sometimes, we have
data in xls format that
needs to be imported
into R prior to its use.
– Library(gdata)
Copyright 2008-2016 Syed Awase Khirni TPRI 52
R-Programming–Basics
Loading”foreign”data
• Using XLConnect
package
• Install.packages(“XLCon
nect”);
Copyright 2008-2016 Syed Awase Khirni TPRI 53
R-Programming–Basics
Loading”foreign data”
• Minitab
– For importing minitab
portable worksheets into
R. We can use foreign
library.
Copyright 2008-2016 Syed Awase Khirni TPRI 54
R-Programming–Basics
Computing Memory Requirements
• An integer takes 8bytes for numeric data type.
• Imagine you have a data frame with 100,000
rows and 100 columns.
• 100,000 X100X8bytes/numeric
– 220 bytes/MB
– Which accounts for 785 MB of memory is
required.
Copyright 2008-2016 Syed Awase Khirni TPRI 55
R-Programming–Basics
Text Formats
• dump and dput are useful because the resulting textual
format is editable and in the case of corruption, potentially
recoverable
• In the case of writing out to a table or CSV file, dump and
dput preserve the metadata (sacrificing some readability),
so that another user doesn’t have to specify it all over
again.
• Textual formats can work much better with version control
programs like GIT and SVN, used to track changes
meaningfully
• Text formats have longer life and adhere to “unix
philosophy”
• However, the format is not very space-efficient.
Copyright 2008-2016 Syed Awase Khirni TPRI 56
R-Programming–Basics
Dump() function
• Creates a file in a format
that can be read with the
source() function or pasted
in with the copy/paste edit
functions of the windowing
system.
Copyright 2008-2016 Syed Awase Khirni TPRI 57
R-Programming–Basics
Dput() function
• Dput function saves data as
an R expression, which
means that the resulting file
can actually be copied and
pasted into the R console.
• Creates and uses an ASCII
file representing the object
• Writes an ASCII version of
the object onto the file.
Copyright 2008-2016 Syed Awase Khirni TPRI 58
R-Programming–Basics
Functions in R
• Functions are a
fundamental building
block of R
– Functions can be
assigned to variables
– Functions can be stored
in lists,
– Functions can be passed
as arguments to other
functions
– Functions can have
nested functions.
• Anonymous functions are
functions that have no
name.
• We use functions to
incorporate sets of
instructions that we want to
use repeatedly or that
because of their complexity,
are better self-contained in
a sub-program and called
when needed.
Copyright 2008-2016 Syed Awase Khirni TPRI 59
R-Programming–Basics
User Defined Functions in R
• UDF are defined to
accomplish a particular
task and are not aware
that a dedicated
function or library exists
already.
Copyright 2008-2016 Syed Awase Khirni TPRI 60
R-Programming–Basics
User Defined Functions in R
Copyright 2008-2016 Syed Awase Khirni TPRI 61
R-Programming–Basics
User Defined Functions in R
Copyright 2008-2016 Syed Awase Khirni TPRI 62
R-Programming–Basics
Infix Operators in R
• They are unique
functions and methods
that facilitate basic data
expressions or
transformations.
• They refer to the
placement of the
arithmetic operator
between variables.
• The types of infix
operators used in R
include functions for
data extraction,
arithmetic sequences,
comparison, logical
testings, variable
assignments and
custom data functions
Copyright 2008-2016 Syed Awase Khirni TPRI 63
R-Programming–Basics
Infix Operator in R
• Infix operators, are used
between operands, these
operators do a function call
in the background.
Copyright 2008-2016 Syed Awase Khirni TPRI 64
R-Programming–Basics
Predefined infix Operators in R
Operator Rank Description
%% 6 Reminder operator
%/% Integer Division
%*% 6 Matrix Multiplication
%o% 6 Outer Product
%x% 6 Kronecker product
%in% 9 Matching operator
:: 1 Extract -> extract function from a package namespace.
::: 1 Extract-> extract a hidden function from a namespace
$ 2 Extract list subset, extract list data by name
@ 2 Extract attributes by memory slot or location.
[[]] 3 Extract data by index
Copyright 2008-2016 Syed Awase Khirni TPRI 65
R-Programming–Basics
Predefined infix operators in R
Operator Rank Description
^ 4 Arithmetic Exponential Operator
: 5 Generate sequence of number
! 8 Not/Negation Operator
Xor 10 Logical/Exclusive OR
& 10 Logical and element
&& 10 Logical and control
~ 11 Assignment(equal) used in formals and model
building
<<- 12 Permanent Assignment
<- 13 Left assignment
-> 13 Right assignment
Copyright 2008-2016 Syed Awase Khirni TPRI 66
R-Programming–Basics
User Defined infix in R
Copyright 2008-2016 Syed Awase Khirni TPRI 67
R-Programming–Basics
User defined infix function in R
Copyright 2008-2016 Syed Awase Khirni TPRI 68
R-Programming–Basics
CONTROL FLOW IN R
SYED AWASE KHIRNI
Copyright 2008-2016 Syed Awase Khirni TPRI 69
R-Programming–Basics
If If..else
Copyright 2008-2016 Syed Awase Khirni TPRI 70
R-Programming–Basics
Ifelse()
• Vectors form the basic
building block of R
programming.
• Most functions in R take
vector as input and output a
resultant vector
• Vectorization of code will be
much faster than applying
the same function to each
element of the vector
individually.
• Ifelse() is a vector
equivalent of if..else
statement
• Test_expression must be a
logical vector (or an object
that can be coerced to
logical)
• Return value is a vector
with the same length as
test_expression
Copyright 2008-2016 Syed Awase Khirni TPRI 71
R-Programming–Basics
forloop
Copyright 2008-2016 Syed Awase Khirni TPRI 72
R-Programming–Basics
While
Copyright 2008-2016 Syed Awase Khirni TPRI 73
R-Programming–Basics
Break Next
Copyright 2008-2016 Syed Awase Khirni TPRI 74
R-Programming–Basics
Repeat Loop
• A repeat loop is used to
iterate over a block of
code multiple number of
time
• There is no condition
check in repeat loop to
exit the loop
• We must put a condition
explicitly inside the body
of the loop and use the
break statement to exit
the loop
Copyright 2008-2016 Syed Awase Khirni TPRI 75
R-Programming–Basics
OBJECTS AND CLASSES IN R
SYED AWASE KHIRNI
Copyright 2008-2016 Syed Awase Khirni TPRI 76
R-Programming–Basics
OOP in R
• An object is a data structure have some
attributes and methods which act on the
attributes
• A class is a blue print for the object.
• R has three(3) class systems
– S3 Class System
– S4 Class System
– Reference Class System
Copyright 2008-2016 Syed Awase Khirni TPRI 77
R-Programming–Basics
S3 Class System
• Primitive in nature
• Lacks a formal definition and
object of this class can be
simply created by adding a
class attribute.
• Objects are created by setting
the class attribute
• Attributes are accessed using $
• Methods belong to generic
function
• Follows copy-on-modify
semantics
S4 Class System
• A formally defined structure
which helps in making object
of the same class look more or
less similar.
• Class components are properly
defined using the setClass()
function and objects are
created using the new()
function.
• Attributes are accessed using
@
• Methods belong to generic
function
• Follows copy-on-modify
semantics
Copyright 2008-2016 Syed Awase Khirni TPRI 78
R-Programming–Basics
Reference Class System
• Similar to the object
oriented programming we
are used to in C# and Java.
• Basically an extension of S4
class system with an
environment added to it.
• Reference Class System
– Class defined using
SetRefClass()
– Objects are created
using generator
functions
– Attributes are accessed
using $
– Methods belong to the
class
– Does not follow copy-
on-modify semantics
Copyright 2008-2016 Syed Awase Khirni TPRI 79
R-Programming–Basics
S3 Class System
Copyright 2008-2016 Syed Awase Khirni TPRI 80
R-Programming–Basics
S3 Class
Copyright 2008-2016 Syed Awase Khirni TPRI 81
R-Programming–Basics
S3 Class Method
Copyright 2008-2016 Syed Awase Khirni TPRI 82
R-Programming–Basics
S3 class with methods
Copyright 2008-2016 Syed Awase Khirni TPRI 83
R-Programming–Basics
Inheritance – S3 Class System
Copyright 2008-2016 Syed Awase Khirni TPRI 84
R-Programming–Basics
S4 Class System in R
• S4 class is defined using the setClass() function
• Member variables are called slots
• When defining a class, we need to set the name and
the slots (along with class of the slot)
Copyright 2008-2016 Syed Awase Khirni TPRI 85
R-Programming–Basics
S4 Class System in R
Accessing Slots
• Slots of an object are
accessed using @
Modifying Slots
Copyright 2008-2016 Syed Awase Khirni TPRI 86
• A slot can be modified
through reassignment
operations as shown below
R-Programming–Basics
Inheritance in S4
Copyright 2008-2016 Syed Awase Khirni TPRI 87
R-Programming–Basics
R Reference Class System
• Reference class in R are similar
to the object oriented
programming, we are used to
seeing in C++, Java, Python.
• Unlike S3 and S4 classes,
methods belong to class rather
than generic functions.
• Reference class are internally
implemented as S4 classes
with an environment added to
it.
• setRefClass() returns a
generator function which is
used to create objects of that
class
Copyright 2008-2016 Syed Awase Khirni TPRI 88
R-Programming–Basics
Reference Class in R
Accessing Fields in R
• Fields of the object can be
accessed using the $
operator
Modifying Fields in R
Copyright 2008-2016 Syed Awase Khirni TPRI 89
• Fields can be modified by
reassignment
R-Programming–Basics
Copyright 2008-2016 Syed Awase Khirni TPRI 90
R-Programming–Basics
Reference Methods .copy()
Copyright 2008-2016 Syed Awase Khirni TPRI 91
R-Programming–Basics
Reference Methods
Copyright 2008-2016 Syed Awase Khirni TPRI 92
R-Programming–Basics
Inheritance in Reference Class
Copyright 2008-2016 Syed Awase Khirni TPRI 93
R-Programming–Basics
sak@sycliq.com
sak@territorialprescience.com
Contact Us
Thank You
We also provide Code Driven Open House Trainings
94© Syed Awase 2008- 16 TPRI
For code driven trainings
Reach out to us +91-9035433124
Current Offerings
• AngularJS 1.5.x
• Typescript
• AngularJS 2 (with NodeJS)
• KnockOutJS (with NodeJS)
• BackBoneJS (with NodeJS)
• Ember JS / Ext JS (with NodeJS)
• Raspberry Pi
• Responsive Web Design with Bootstrap, Google
Material Design and KendoUI
• C# ASP.NET MVC
• C# ASP.NET WEB API
• C# ASP.NET WCF, WPF
• JAVA , SPRING, HIBERNATE
• Python , Django
• R Statistical Programming
• Android Programming
• Python/Django
• Ruby on Rails
INDIA
HYDERABAD | BANGALORE | CHENNAI | PUNE
OVERSEAS
SINGAPORE | MALAYSIA | DUBAI

R programming groundup-basic-section-i

  • 1.
    R-Programming–Basics R Programming Ground Up! SyedAwase Khirni Syed Awase earned his PhD from University of Zurich in GIS, supported by EU V Framework Scholarship from SPIRIT Project (www.geo-spirit.org). He currently provides consulting services through his startup www.territorialprescience.com and www.sycliq.com 1Copyright 2008-2016 Syed Awase Khirni TPRI
  • 2.
    R-Programming–Basics R Project • R– Free Software environment for statistical computing and graphics. • https://www.r- project.org • https://cran.r- project.org/mirrors.html Copyright 2008-2016 Syed Awase Khirni TPRI 2
  • 3.
    R-Programming–Basics S • S Language– Developed by John Chambers et. al at Bell Labs • 1976 -> internal statistical analysis environment – originally implemented as Fortran Libraries • 1988-> Rewritten in C – statistical models in S by Chambers and Hastie • 1998-> S v.4.0 • 1991-> R created in New Zealand by Ross Ihaka and Robert Gentleman. • 1993 -> public release of R • 1995-> Martin Machler convinced Ross and Robert to use the GNU GPU License • 1996 , 1997 -> R Core Group Formed with (S Plus Core Group) • 2000- R Version 1.0 Released • 2015 R Version 3.1.3 -> March 9, 2015. Copyright 2008-2016 Syed Awase Khirni TPRI 3
  • 4.
    R-Programming–Basics Design of theR System • R –Statistical Programming language based on S language developed by Bell Labs. • Divided into 2 conceptual parts – Base – Add-on Packages • Base – R System contains – The base package which is required to run R and contains the most fundamental functions. – Other packages contained in the base system include utils, stats, datasets, graphics, grDevices, grid, methods, tools, parallel, compiler, splines, tcltk, stats4 • Add-on Packages are packages that are published by either R Core group or any third party vendors • Syntax similar to S, making it easy for S-PLUS users to switch over • Semantics are superficially similar to S, but in reality are quite different • Runs on almost any standard computing platform/OS Copyright 2008-2016 Syed Awase Khirni TPRI 4
  • 5.
    R-Programming–Basics R? • R isan integrated suite of software facilities for data manipulation, calculation and graphical display • R has – Effective data handling and storage facility – A suite of operators for calculations on arrays and matrices – A large, coherent, integrated collection of tools for data analysis – Graphical facilities for data analysis and display – A well developed, simple and effective programming language Copyright 2008-2016 Syed Awase Khirni TPRI 5
  • 6.
    R-Programming–Basics R- Drawbacks • Littlebuilt-in support for dynamic or 3-D graphics • Functionality is based on consumer demand and user contributions • Web support provided through third party software. Copyright 2008-2016 Syed Awase Khirni TPRI 6
  • 7.
    R-Programming–Basics DATA TYPES ANDBASIC OPERATIONS IN R Copyright 2008-2016 Syed Awase Khirni TPRI 7
  • 8.
    R-Programming–Basics Data Types • Objects •Numbers • Attributes • Entering Input and Printing • Vectors, Lists • Factors • Missing Values • Data Frames • Names Copyright 2008-2016 Syed Awase Khirni TPRI 8
  • 9.
    R-Programming–Basics Objects in R •R has five basic or atomic classes of objects – Character – Numeric (real number) – Integer – Complex – Logical (true/false) • The most basic object is a vector – A vector can only contain objects of the same class – The one exception is a list, which is represented as a vector but can contain objects of different classes – Empty vectors can be created with the vector() function Copyright 2008-2016 Syed Awase Khirni TPRI 9
  • 10.
  • 11.
    R-Programming–Basics Install.packages() • To installadditional third party packages into your R software. We use • Install.packages(“XLCon nect”) – To install XLConnect package – To activate an already installed package we use • Library(“packagename”) Copyright 2008-2016 Syed Awase Khirni TPRI 11 Check if the package is already installed or not. any(grepl("<name of your package>", installed.packages()))
  • 12.
    R-Programming–Basics Numbers in R •Treated as numeric objects (i.e. double precision real numbers) • Suffix L => integer • Example : 1 => numeric object – 1L => explicitly gives an integer • 1/0 => inf (infinity) • NaN => not a number or missing value Copyright 2008-2016 Syed Awase Khirni TPRI 12
  • 13.
    R-Programming–Basics Attributes • R objectscan have attributes – Names, dimnames – Dimensions (e.g. matrices, arrays) – Class – Length – Other user-defined attributes/metadata • Attributes of an object can be accessed using the attributes() function. Copyright 2008-2016 Syed Awase Khirni TPRI 13
  • 14.
    R-Programming–Basics Assignment Operator (<-) •Expressions in R are done using <- assignment operator. • The grammar of the language determines whether an expression is complete or not • The # character indicates a comment. Anything to the right of the # (including the # itself) is ignored • [1] indicates that x is a vector and 123781213412 is the first element Copyright 2008-2016 Syed Awase Khirni TPRI 14 //auto printing Ctrl+L to clear console
  • 15.
    R-Programming–Basics Vectors in R •The c() function can be used to create vectors of objects. Copyright 2008-2016 Syed Awase Khirni TPRI 15
  • 16.
    R-Programming–Basics Vectors in R •Using the vector() function Copyright 2008-2016 Syed Awase Khirni TPRI 16
  • 17.
    R-Programming–Basics Mixing Objects • Whendifferent objects are mixed in a vector, coercion occurs so that every element in the vector is of the same class. Copyright 2008-2016 Syed Awase Khirni TPRI 17
  • 18.
    R-Programming–Basics Explicit Coercion • Objectscan be explicitly coerced from one class to another using the as.* functions. Copyright 2008-2016 Syed Awase Khirni TPRI 18
  • 19.
    R-Programming–Basics Matrices • Vectors witha dimension attribute are called Matrices. The dimension attribute is itself an integer vector of length 2(nrow, ncol) • Matrices are constructed column-wise, so entries can be thought of starting from the upper left corner and running down the columns. • Matrices can also be created directly from vectors by adding a dimension attribute. Copyright 2008-2016 Syed Awase Khirni TPRI 19
  • 20.
    R-Programming–Basics Cbind-ing • Matrices canbe created by Column-binding with cbind() function Copyright 2008-2016 Syed Awase Khirni TPRI 20
  • 21.
    R-Programming–Basics Rbind-ing • Matrices canbe created by row-binding using rbind() function. Copyright 2008-2016 Syed Awase Khirni TPRI 21
  • 22.
    R-Programming–Basics Lists in R •Lists are a special type of vector that can contain elements of different classes. • Lists are a very important data type in R Copyright 2008-2016 Syed Awase Khirni TPRI 22
  • 23.
    R-Programming–Basics Factors • Used torepresent categorical data. Factors can be unordered or ordered. • Factors are treated specially by modelling functions like lm() and glm() • Using factors with labels is better than using integers because factors are self- describing, having a variable that has values. Copyright 2008-2016 Syed Awase Khirni TPRI 23
  • 24.
    R-Programming–Basics Missing Values • Manyexisting, industrial and research datasets contain Missing values. • These can occur due to various reasons such as manual data entry procedures, equipment errors and incorrect measurements. • Missing values can appear in the form of outliers or even wrong data (i.e out of boundaries) Copyright 2008-2016 Syed Awase Khirni TPRI 24 • Missing values are denoted by NA or NaN for undefined mathematical operations – Is.na() is used to test objects if they are NA – Is.nan() is used to test for NaN – NA values have a class also, so there are integerNA, characterNA etc. – A NaN value is also NA but the converse is not true.
  • 25.
    R-Programming–Basics Missing Values • Threetype of problems are usually associated with missing values – Loss of efficiency – Complications in handling and analyzing the data – Bias resulting from differences between missing and complete data. Copyright 2008-2016 Syed Awase Khirni TPRI 25 Identifying NA values using is.na() and is.nan()
  • 26.
    R-Programming–Basics Data Frames • Usedto store tabular data (table of values) – They are represented as a special type of list, where every element of the list has to have the same length. – Each element of the list can be thought of as a column and the length of each element of the list is the number of the rows • Data frames can store different classes of objects in each column, while matrices must have every element of the same class • Data frames also have a special attribute called row.names. • Data frames are usually created by calling read.table() or read.csv() • Can be converted to a matrix by calling data.matrix() method Copyright 2008-2016 Syed Awase Khirni TPRI 26
  • 27.
  • 28.
    R-Programming–Basics Data Frame inR Copyright 2008-2016 Syed Awase Khirni TPRI 28
  • 29.
    R-Programming–Basics Names in R •R Objects can also have names, which is very useful for writing readable code and self- describing objects Copyright 2008-2016 Syed Awase Khirni TPRI 29
  • 30.
    R-Programming–Basics Subsetting • Extracting subsetsfrom an existing dataset is called subsetting – []Always returns an object of the same class as the original – [[]]Used to extract elements of a list or a data frame. – $ is used to extract element of a list or data frame by name; semantics are similar to that of [[]]. Copyright 2008-2016 Syed Awase Khirni TPRI 30
  • 31.
  • 32.
  • 33.
  • 34.
    R-Programming–Basics Partial Matching • Partialmatching of names is allowed with [[]] and $ Copyright 2008-2016 Syed Awase Khirni TPRI 34
  • 35.
    R-Programming–Basics Remove NA values •A common task is to remove missing value (NAs) prior to performing any analysis. Copyright 2008-2016 Syed Awase Khirni TPRI 35
  • 36.
    R-Programming–Basics Vectorized Operations • Manyoperations in R are vectorized making code more efficient, concise and easier to read. Copyright 2008-2016 Syed Awase Khirni TPRI 36
  • 37.
  • 38.
    R-Programming–Basics Reading Data • Rprovides some useful functions to read data – Read.table, read.csv for reading tabular data – readLines, for reading lines of a text file – Source: for reading in R code files (inverse of dump) – dget: for reading in R code files (inverse of dput) – Load: for reading in saved workspaces – Unserialize, for reading single R objects in binary form. Copyright 2008-2016 Syed Awase Khirni TPRI 38
  • 39.
    R-Programming–Basics Writing Data • Rprovides a set of functions to write data into files – Write.table: to write data in table format – writeLines: to write lines – Dump – Dput – Save – serialize Copyright 2008-2016 Syed Awase Khirni TPRI 39
  • 40.
    R-Programming–Basics Reading data fileswith read.table • For small to moderately sized datasets, we can just call read.table without specifying any other arguments. • Data <- read.table(“sampledata. txt”) Copyright 2008-2016 Syed Awase Khirni TPRI 40
  • 41.
    R-Programming–Basics R-DataSets • https://vincentarelbundock.github.io/Rdatasets/ datasets.html • http://openflights.org/data.html •http://www.public.iastate.edu/~hofmann/data_i n_r_sortable.html • https://r-dir.com/reference/datasets.html • http://fimi.ua.ac.be/data/ • https://datamarket.com/data/list/?q=provider:ts dl • https://www.data.gov/ Copyright 2008-2016 Syed Awase Khirni TPRI 41
  • 42.
    R-Programming–Basics Directory/get working directory •Setting and getting the current working directory Copyright 2008-2016 Syed Awase Khirni TPRI 42 > setwd("<path to your folder>")
  • 43.
    R-Programming–Basics Reading CSV files Copyright2008-2016 Syed Awase Khirni TPRI 43
  • 44.
  • 45.
    R-Programming–Basics Mocking sample datawith mockaroo Copyright 2008-2016 Syed Awase Khirni TPRI 45 https://www.mockaroo.com/
  • 46.
    R-Programming–Basics Reading large datasetswith read.table Copyright 2008-2016 Syed Awase Khirni TPRI 46
  • 47.
    R-Programming–Basics Write.csv() • One ofthe easiest ways to save an R data frame is to write it to a csv file or tsv file or text file. Copyright 2008-2016 Syed Awase Khirni TPRI 47
  • 48.
    R-Programming–Basics dput() • Writes anASCII text representation of an R object to a file or connection, or uses one to recreate the object Copyright 2008-2016 Syed Awase Khirni TPRI 48
  • 49.
    R-Programming–Basics Head and Tailof DataSet • Returns the first or the last part of an object , i.e. vector, matrix, table, data frame or function. Copyright 2008-2016 Syed Awase Khirni TPRI 49
  • 50.
    R-Programming–Basics Loading “foreign” data •Sometimes, we would like to import data from other statistical packages like SAS,SPSS and Stata • Reading stata (.dta) files with foreign library • Writing data files from R into Stata is also very straightforward. Copyright 2008-2016 Syed Awase Khirni TPRI 50
  • 51.
    R-Programming–Basics Library”foreign”data • SPSS Data –For data files in SPSS format, it can be opened with the function read.spss from “foreign” package. – “to.data.frame” option set to TRUE to return a data frame. Copyright 2008-2016 Syed Awase Khirni TPRI 51
  • 52.
    R-Programming–Basics Loading “foreign”data • Exceldata – Sometimes, we have data in xls format that needs to be imported into R prior to its use. – Library(gdata) Copyright 2008-2016 Syed Awase Khirni TPRI 52
  • 53.
    R-Programming–Basics Loading”foreign”data • Using XLConnect package •Install.packages(“XLCon nect”); Copyright 2008-2016 Syed Awase Khirni TPRI 53
  • 54.
    R-Programming–Basics Loading”foreign data” • Minitab –For importing minitab portable worksheets into R. We can use foreign library. Copyright 2008-2016 Syed Awase Khirni TPRI 54
  • 55.
    R-Programming–Basics Computing Memory Requirements •An integer takes 8bytes for numeric data type. • Imagine you have a data frame with 100,000 rows and 100 columns. • 100,000 X100X8bytes/numeric – 220 bytes/MB – Which accounts for 785 MB of memory is required. Copyright 2008-2016 Syed Awase Khirni TPRI 55
  • 56.
    R-Programming–Basics Text Formats • dumpand dput are useful because the resulting textual format is editable and in the case of corruption, potentially recoverable • In the case of writing out to a table or CSV file, dump and dput preserve the metadata (sacrificing some readability), so that another user doesn’t have to specify it all over again. • Textual formats can work much better with version control programs like GIT and SVN, used to track changes meaningfully • Text formats have longer life and adhere to “unix philosophy” • However, the format is not very space-efficient. Copyright 2008-2016 Syed Awase Khirni TPRI 56
  • 57.
    R-Programming–Basics Dump() function • Createsa file in a format that can be read with the source() function or pasted in with the copy/paste edit functions of the windowing system. Copyright 2008-2016 Syed Awase Khirni TPRI 57
  • 58.
    R-Programming–Basics Dput() function • Dputfunction saves data as an R expression, which means that the resulting file can actually be copied and pasted into the R console. • Creates and uses an ASCII file representing the object • Writes an ASCII version of the object onto the file. Copyright 2008-2016 Syed Awase Khirni TPRI 58
  • 59.
    R-Programming–Basics Functions in R •Functions are a fundamental building block of R – Functions can be assigned to variables – Functions can be stored in lists, – Functions can be passed as arguments to other functions – Functions can have nested functions. • Anonymous functions are functions that have no name. • We use functions to incorporate sets of instructions that we want to use repeatedly or that because of their complexity, are better self-contained in a sub-program and called when needed. Copyright 2008-2016 Syed Awase Khirni TPRI 59
  • 60.
    R-Programming–Basics User Defined Functionsin R • UDF are defined to accomplish a particular task and are not aware that a dedicated function or library exists already. Copyright 2008-2016 Syed Awase Khirni TPRI 60
  • 61.
    R-Programming–Basics User Defined Functionsin R Copyright 2008-2016 Syed Awase Khirni TPRI 61
  • 62.
    R-Programming–Basics User Defined Functionsin R Copyright 2008-2016 Syed Awase Khirni TPRI 62
  • 63.
    R-Programming–Basics Infix Operators inR • They are unique functions and methods that facilitate basic data expressions or transformations. • They refer to the placement of the arithmetic operator between variables. • The types of infix operators used in R include functions for data extraction, arithmetic sequences, comparison, logical testings, variable assignments and custom data functions Copyright 2008-2016 Syed Awase Khirni TPRI 63
  • 64.
    R-Programming–Basics Infix Operator inR • Infix operators, are used between operands, these operators do a function call in the background. Copyright 2008-2016 Syed Awase Khirni TPRI 64
  • 65.
    R-Programming–Basics Predefined infix Operatorsin R Operator Rank Description %% 6 Reminder operator %/% Integer Division %*% 6 Matrix Multiplication %o% 6 Outer Product %x% 6 Kronecker product %in% 9 Matching operator :: 1 Extract -> extract function from a package namespace. ::: 1 Extract-> extract a hidden function from a namespace $ 2 Extract list subset, extract list data by name @ 2 Extract attributes by memory slot or location. [[]] 3 Extract data by index Copyright 2008-2016 Syed Awase Khirni TPRI 65
  • 66.
    R-Programming–Basics Predefined infix operatorsin R Operator Rank Description ^ 4 Arithmetic Exponential Operator : 5 Generate sequence of number ! 8 Not/Negation Operator Xor 10 Logical/Exclusive OR & 10 Logical and element && 10 Logical and control ~ 11 Assignment(equal) used in formals and model building <<- 12 Permanent Assignment <- 13 Left assignment -> 13 Right assignment Copyright 2008-2016 Syed Awase Khirni TPRI 66
  • 67.
    R-Programming–Basics User Defined infixin R Copyright 2008-2016 Syed Awase Khirni TPRI 67
  • 68.
    R-Programming–Basics User defined infixfunction in R Copyright 2008-2016 Syed Awase Khirni TPRI 68
  • 69.
    R-Programming–Basics CONTROL FLOW INR SYED AWASE KHIRNI Copyright 2008-2016 Syed Awase Khirni TPRI 69
  • 70.
  • 71.
    R-Programming–Basics Ifelse() • Vectors formthe basic building block of R programming. • Most functions in R take vector as input and output a resultant vector • Vectorization of code will be much faster than applying the same function to each element of the vector individually. • Ifelse() is a vector equivalent of if..else statement • Test_expression must be a logical vector (or an object that can be coerced to logical) • Return value is a vector with the same length as test_expression Copyright 2008-2016 Syed Awase Khirni TPRI 71
  • 72.
  • 73.
  • 74.
  • 75.
    R-Programming–Basics Repeat Loop • Arepeat loop is used to iterate over a block of code multiple number of time • There is no condition check in repeat loop to exit the loop • We must put a condition explicitly inside the body of the loop and use the break statement to exit the loop Copyright 2008-2016 Syed Awase Khirni TPRI 75
  • 76.
    R-Programming–Basics OBJECTS AND CLASSESIN R SYED AWASE KHIRNI Copyright 2008-2016 Syed Awase Khirni TPRI 76
  • 77.
    R-Programming–Basics OOP in R •An object is a data structure have some attributes and methods which act on the attributes • A class is a blue print for the object. • R has three(3) class systems – S3 Class System – S4 Class System – Reference Class System Copyright 2008-2016 Syed Awase Khirni TPRI 77
  • 78.
    R-Programming–Basics S3 Class System •Primitive in nature • Lacks a formal definition and object of this class can be simply created by adding a class attribute. • Objects are created by setting the class attribute • Attributes are accessed using $ • Methods belong to generic function • Follows copy-on-modify semantics S4 Class System • A formally defined structure which helps in making object of the same class look more or less similar. • Class components are properly defined using the setClass() function and objects are created using the new() function. • Attributes are accessed using @ • Methods belong to generic function • Follows copy-on-modify semantics Copyright 2008-2016 Syed Awase Khirni TPRI 78
  • 79.
    R-Programming–Basics Reference Class System •Similar to the object oriented programming we are used to in C# and Java. • Basically an extension of S4 class system with an environment added to it. • Reference Class System – Class defined using SetRefClass() – Objects are created using generator functions – Attributes are accessed using $ – Methods belong to the class – Does not follow copy- on-modify semantics Copyright 2008-2016 Syed Awase Khirni TPRI 79
  • 80.
    R-Programming–Basics S3 Class System Copyright2008-2016 Syed Awase Khirni TPRI 80
  • 81.
  • 82.
    R-Programming–Basics S3 Class Method Copyright2008-2016 Syed Awase Khirni TPRI 82
  • 83.
    R-Programming–Basics S3 class withmethods Copyright 2008-2016 Syed Awase Khirni TPRI 83
  • 84.
    R-Programming–Basics Inheritance – S3Class System Copyright 2008-2016 Syed Awase Khirni TPRI 84
  • 85.
    R-Programming–Basics S4 Class Systemin R • S4 class is defined using the setClass() function • Member variables are called slots • When defining a class, we need to set the name and the slots (along with class of the slot) Copyright 2008-2016 Syed Awase Khirni TPRI 85
  • 86.
    R-Programming–Basics S4 Class Systemin R Accessing Slots • Slots of an object are accessed using @ Modifying Slots Copyright 2008-2016 Syed Awase Khirni TPRI 86 • A slot can be modified through reassignment operations as shown below
  • 87.
    R-Programming–Basics Inheritance in S4 Copyright2008-2016 Syed Awase Khirni TPRI 87
  • 88.
    R-Programming–Basics R Reference ClassSystem • Reference class in R are similar to the object oriented programming, we are used to seeing in C++, Java, Python. • Unlike S3 and S4 classes, methods belong to class rather than generic functions. • Reference class are internally implemented as S4 classes with an environment added to it. • setRefClass() returns a generator function which is used to create objects of that class Copyright 2008-2016 Syed Awase Khirni TPRI 88
  • 89.
    R-Programming–Basics Reference Class inR Accessing Fields in R • Fields of the object can be accessed using the $ operator Modifying Fields in R Copyright 2008-2016 Syed Awase Khirni TPRI 89 • Fields can be modified by reassignment
  • 90.
  • 91.
  • 92.
  • 93.
    R-Programming–Basics Inheritance in ReferenceClass Copyright 2008-2016 Syed Awase Khirni TPRI 93
  • 94.
    R-Programming–Basics sak@sycliq.com sak@territorialprescience.com Contact Us Thank You Wealso provide Code Driven Open House Trainings 94© Syed Awase 2008- 16 TPRI For code driven trainings Reach out to us +91-9035433124 Current Offerings • AngularJS 1.5.x • Typescript • AngularJS 2 (with NodeJS) • KnockOutJS (with NodeJS) • BackBoneJS (with NodeJS) • Ember JS / Ext JS (with NodeJS) • Raspberry Pi • Responsive Web Design with Bootstrap, Google Material Design and KendoUI • C# ASP.NET MVC • C# ASP.NET WEB API • C# ASP.NET WCF, WPF • JAVA , SPRING, HIBERNATE • Python , Django • R Statistical Programming • Android Programming • Python/Django • Ruby on Rails INDIA HYDERABAD | BANGALORE | CHENNAI | PUNE OVERSEAS SINGAPORE | MALAYSIA | DUBAI