KEMBAR78
Lecture 1 - R Introduction1 | PDF | Matrix (Mathematics) | Data Type
0% found this document useful (0 votes)
51 views77 pages

Lecture 1 - R Introduction1

The document provides an introduction to R, a programming language and software environment for data analytics and statistical computing. It outlines the reasons for learning R, including its open-source nature, data mining capabilities, and strong statistical functions, as well as various programming interfaces like Rstudio and Google Colab. Additionally, it covers R's data types, packages, and basic coding practices, emphasizing the importance of hands-on coding for skill development.

Uploaded by

yujiaaoro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views77 pages

Lecture 1 - R Introduction1

The document provides an introduction to R, a programming language and software environment for data analytics and statistical computing. It outlines the reasons for learning R, including its open-source nature, data mining capabilities, and strong statistical functions, as well as various programming interfaces like Rstudio and Google Colab. Additionally, it covers R's data types, packages, and basic coding practices, emphasizing the importance of hands-on coding for skill development.

Uploaded by

yujiaaoro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

MH6211 Analytics software I

Lecture 1: R Introduction

Li Xiaoli
Nanyang Technological University
Outline
1. What is R?
2. Why do we learn R?
3. Programming interface and simple coding
4. R simple data types
5. R complex data types

2
1. What is R?
• R is a programming language and software environment for data
analytics, statistical computing and graphics supported by the R
Foundation. The R language is widely used among data miners and
statisticians for developing statistical software and data analysis.
• R is freely available under the GNU General Public License (GPL), and
pre-compiled binary versions are provided for various operating
systems. While R has a command line interface, there are several GUIs
available (e.g. Rstudio, Jupyter Notebook, Colab) – we should use GUIs.
• GPL is free software license, which guarantees end users (individuals,
organizations, companies) the freedoms to run, study, share (copy), and
modify the software.
- Adapt from Wikipedia
3
What are R’s main functions?
R is an integrated suite of software for data manipulation,
calculation, and graphical display
• Effective data handling (read, write, and manipulate data in various formats).
• Rich data types (vectors, matrices, data frames, and lists, making it
versatile for different types of data analysis) .
• Well-developed language including conditionals, loops, functions
and I/O capabilities.
• Various operators for calculations on arrays/matrices.
• Rich data analytics (machine learning, optimization) packages
• Graphical facilities for data analysis (R graphics system, ggplot2, heatmap, …).
4
What are R’s packages? https://www.r-project.org/
 Open source, most widely used for statistical
analysis, data analytics and graphics
 Extensible via dynamically loadable add-on
packages
 Large number (~20k) of packages

Machine Learning, optimization & Statistical Learning


 What are the available packages?
 Can you find packages that can be used for
typical data analytics tasks , including association
rule mining, classification, clustering, regression,
outlier/anomaly detection
 Comprehensive R Archive Network -CRAN:
https://cran.r-project.org/web/views/

Relatively easy to use and powerful


v = rnorm(256)
v
summary(v)
plot(v)
5
Outline
1. What is R?
2. Why do we learn R?
3. Programming interfaces and simple coding
4. R simple data types
5. R complex data types

6
2. Why do we learn R?
• Open source – It’s free! Fintech companies likely turn to R or Python
(they typically use SAS, which is quite expensive)
• Data Mining – R is widely used by data scientists.
• Statistical functions – R is designed for statistical computing.
• Econometrics
• Genetics
• …..
• Good graphing engine – plot nice graphs with built-in functions and
external packages like ggplot2 and heatmap.
• Easy to Use and powerful – do more with less, so you can spend more
time thinking about the problem you are trying to solve, instead of
focusing too much on implementations.
7
Why do we learn R?
• Statistics & Data Mining
• Commercial

• Technical computing
• Matrix and vector
formulations
• Commercial
Statistical computing and graphics
http://www.r-project.org
• Data Visualization and analysis platform •Expanded by community as open source
• Image processing, vector computing • Statistically rich
• Not well suited to general programming • Data analytics packages rich

8
The Programmer’s Dilemma
What programming
language to use & why?

Scripting
(R, Python, MATLAB, IDL)

Object Oriented
(C++, Java)

Functional languages
(C, Fortran)

Assembly
9
Why R?

https://www.kdnuggets.com/2020/06/data-science-tools-popularity-animated.html 10
Outline
1. What is R?
2. Why do we learn R?
3. Programming interfaces and simple coding
4. R simple data types
5. R complex data types

11
2 Programming Interfaces and Installations
• The following multiple GUIs (program interfaces) are widely used. They can run R and Python.
• 1. Rstudio: install both R and Rstudio
• 2. Google Colab: just use browser

1. Rstudio
2. Google Colab

We will demo how to use Rstudio and Google Colab.

You can choose one of them, namely Rstudio IDE, Google Colab, based on your own preference

There are other interfaces can also run R, including Jupyter Notebook/Lab, VS Code, or Anaconda 12
Rstudio IDE software at
https://posit.co/download/rstudio-desktop/

RStudio IDE is a powerful and productive user interface for R


Video: Simple Introduction of Rstudio IDE
https://www.youtube.com/watch?v=FIrsOBy5k58

CRAN Task Views (useful for us to find some useful packages in certain
GUI 1: Install domain/vertical)
https://cran.r-project.org/web/views/
R and Rstudio

13
• No explicit installation.

• Use the following RUL to start Google Colab


GUI 2: Install
• https://colab.research.google.com/notebo
Goggle Colab ok#create=true&language=r
• Could be slightly slow

14
Below, we will
demo how use A. Rstudio
each of the
GUIs/software B. Google Colab
in turn
15
A. Running programs/commands using Rstudio
• 1. Command line/Console:
– After R or Rstudio is started, there is a console waiting for input.
– At the prompt (>), you can enter numbers and operators to perform calculations:
– >1+2
[1] 3
– > 5*8
[1] 40
– comments: all text (within same line) after pound sign "#"
>1+1 # this is a comment - system will ignore
[1] 2

• 2. Program interface mode (consisting of many lines of codes)


– Write a new program: File-> New File -> R script
– Run current line: Crtl + Enter – very convenient
– How to run a few lines?
16
Rstudio: Getting Started

Command line/Console interface

The RStudio GUI 17


Rstudio: How to go to R Program interface

You can then choose a folder to save this R source file. Default file extension name is .R
18
B. Running programs/commands using Goggle Colab
First, suggested to use the following to start Google Colab
https://colab.research.google.com/notebook#create=true&language=r

Select Code and Text

You can choose to format text using heading, bold, italicize, insert hyperlink,
insert image, Indent, add numbered list, add bulleted list, horizontal rule etc

19
R’s own datasets
R comes with a number of sample datasets that you can experiment with. Type
> data( )
To see the available datasets. The results will depend on which packages you
have installed & loaded. You can type the name of data set to see its content.
> CO2

20
Simple Coding
• To facilitate your learning, I provide all the codes that we cover
in the lectures.
• You can run them so that you understand grammar and
semantics behind.
• However, this is DEFINITELY NOT enough, you should write
your own codes – knowledge can then be really gained and
become part of your skill sets that can be used repeatedly for
your future career.

21
3.1. Arithmetic in R
• You can use R as a calculator
• Typed expressions will be evaluated and printed out
• Main operations: +, -, *, /, ^
• Obeys order of operations
• Use parentheses to group expressions
• More complex operations appear as functions
• sqrt(2)
• sin(pi/4), cos(pi/4), tan(pi/4), asin(1), acos(1), atan(1)
• exp(1), log(2), log10(10), log2(2)

22
3.2. Some basic functions
# list installed packages
> library()
# install a package - sometimes we need additional functionality
beyond those offered by the core R library. In order to install an extension
package, you should invoke the install.packages function at the prompt and
follow the instruction.
From https://cran.r-
project.org/web/packages/available_packages_by_date.html, it has a package
caret: Classification and Regression Training
• > install.packages("caret")
# load a library
 library(caret)
https://data-flair.training/blogs/r-packages-for-data-science/
23
R Packages
• One of the strengths of R is that the system can easily be extended. The
system allows you to write new functions and package those functions in
a so-called `R package' (or `R library').
• The R package may also contain other R objects, for example data sets or
documentation. There is a lively R user community and many R packages
have been written and made available on CRAN for other users.
• Instructions for Creating Your Own R Package:
http://web.mit.edu/insong/www/pdf/rpackage_instructions.pdf
• Just a few examples, there are existing packages for portfolio
optimization, drawing maps, exporting objects to html, time series
analysis, spatial statistics and the list goes on and on.

24
R Packages
• When you download R, already a number of packages are downloaded
as well.
• To use a function in an R package, that package has to be attached to
the system.
• When you start R not all of the downloaded packages are loaded or
attached, only some important packages are attached to the system by
default. You can use the function search to see a list of packages that
are currently attached to the system, this list is also called the search
path.

25
Getting help & find useful information
• From R GUI (i.e., Command line interface), you can type the
following commands and understand more about different
functions
> help(function_name)
> help(prcomp) #Principal Components Analysis
> ? function_name
> ?prcomp
> help.search(“topic”)
> ?? topic
Q:What is the difference between ? and ??
26
Getting help & find useful information
# help for a function: Single ? (if you know what function
you are looking for)
? mean
x <- c(0:10, 50)
xm <- mean(x)
c(xm, mean(x, trim = 0.10))

If trim is non-zero, a symmetrically trimmed


mean is computed with a fraction
of trim observations deleted from each end
before the mean is computed.

What is the purpose of trim?

27
Getting help & find useful information
# search help files – double ?? (free text search from all
the help in the menu)
> ??mean
# you can perform a fuzzy search with the apropos
function.
> apropos("nova")
> apropos: Find Objects by Partial Name
• apropos() returns a character vector giving the object names matching search
query word (e.g. “nova”).
> V=apropos("GLM") # several results
> V[1]
> V[5]
> apropos("GLM", ignore.case = FALSE) # No result returned
> apropos("lq")

28
Call a in-built R Function
R functions are invoked by its name, then followed by the parenthesis,
and zero or more arguments. The following apply the function c to
combine three numeric values into a vector.
> c(1, 2, 3)
[1] 1 2 3
> factorial(5)
[1] 120

29
3.3. Variables and assignment
• Use variables to store values
• Just typing the variable by itself at the prompt will print out the value.
• Three ways to assign variables
•a=6
• a <- 6
• 6 -> a

• Update variables by using the current value in an assignment


•x=0
•x=x+1
30
Outline
1. What is R?
2. Why do we learn R?
3. Programming interfaces and simple coding
4. R simple data types
5. R complex data types

31
4. R simple data types
• There are 5 basic R data types that are of frequent
occurrence in routine R calculations:
1. Numeric
2. Integer
3. Complex
4. Logical
5. Character
• We can better understand them by direct experimentation
with some R codes.

32
R simple data type: Numeric
• Decimal values are called numerics. It is the default
computational data type. If we assign a decimal value to a
variable x, x will be of numeric type.
• > x = 10.5 # assign a decimal value
>x # print the value of x
[1] 10.5
• > class(x) # print the class name of x
[1] "numeric"
• Furthermore, even if we assign an integer to a variable k, it is
still being saved as a numeric value.
33
R simple data type: Numeric
• >k=1
>k # print the value of k
[1] 1
• > class(k) # print the class name of k
[1] "numeric"
• The fact that k is not an integer can be also confirmed
with the is.integer function.
• > is.integer(k) # is k an integer?
[1] FALSE
34
R simple data type: Integer
• In order to create an integer variable in R, we invoke the
as.integer function. We can be assured that y is indeed an
integer by applying the is.integer function.
• > y = as.integer(3)
>y # print the value of y
[1] 3
• > class(y) # print the class (type) name of y
[1] "integer"
> is.integer(y) # is y an integer?
[1] TRUE
• We can coerce a numeric value into an integer with the same
as.integer function.
35
R simple data type: Integer
• > as.integer(3.14) # coerce a numeric value
[1] 3
• And we can parse a decimal string for decimal values in the same way.
• > as.integer("5.27") # coerce a decimal string
[1] 5
• It simply means we can convert data type from numeric or decimal string
to integer
• > class(as.integer("5.27"))
[1] "integer"

36
R simple data type: Integer (Cont.)
• On the other hand, it is erroneous trying to parse a non-decimal string.
• > as.integer("Joe") # coerce an non−decimal string
[1] NA
Warning message:
NAs introduced by coercion
• Often, it is useful to perform arithmetic on logical values. TRUE has the
value 1, while FALSE has value 0.
• > as.integer(TRUE) # the numeric value of TRUE
[1] 1
> as.integer(FALSE) # the numeric value of FALSE
[1] 0
• Then, how about as.integer(3<5)?

37
R simple data type: Complex
• A complex value in R is defined via the pure imaginary value i.
• > z = 1 + 2i # create a complex number
>z # print the value of z
[1] 1+2i
> class(z) # print the class name of z
[1] "complex"
• The following gives an error as −1 is not a complex value.
• > sqrt(−1) # square root of −1
[1] NaN
Warning message:
In sqrt(−1) : NaNs produced
• Instead, we have to use the complex value −1 + 0i.
• > sqrt(−1+0i) # square root of −1+0i
[1] 0+1i
• An alternative is to coerce −1 into a complex value.
• > sqrt(as.complex(−1))
[1] 0+1i
38
R simple data type: Logical
• A logical value is often created via comparison between variables.
• > x = 1; y = 2 # sample values
> z = x > y # is x larger than y?
>z # print the logical value
[1] FALSE
> class(z) # print the class name of z
[1] "logical"
• Standard logical operations are "&" (and), "|" (or), and "!" (negation).
• > u = TRUE; v = FALSE
>u&v # u AND v
[1] FALSE
>u|v # u OR v
[1] TRUE
> !u # negation of u
[1] FALSE
• Further details can be found in the R documentation.
> help("&")
39
R simple data type: Character
• A character object is used to represent string values in R. We convert objects
into character values with the as.character() function:
• > x = as.character(3.14)
>x # print the character string
[1] "3.14"
> class(x) # print the class name of x
[1] "character"
• Two character values can be concatenated with the paste function.
• > fname = "Joe"; lname ="Biden"
> paste(fname, lname)
[1] "Joe Biden"
• However, it is often more convenient to create a readable string with the
sprintf function, which has a C language syntax.
> sprintf("%s has %d dollars", "Sam", 100)
[1] "Sam has 100 dollars"
40
R simple data type: Character (Cont.)
• To extract a substring, we apply the substr function. Here is an example
showing how to extract the substring between the third and twelfth
positions in a string.
• > substr("Mary has a little lamb.", start=3, stop=12)
[1] "ry has a l"
• And to replace/substitute the first occurrence of the word "little" by another
word "big" in the string, we apply the sub function.
• > sub("little", "big", "Mary has a little lamb.")
[1] "Mary has a big lamb."
• More functions for string manipulation can be found in the R
documentation.
• > help("sub")
41
Outline
1. What is R?
2. Why do we learn R?
3. Programming interfaces and simple coding
4. R simple data types
5. R complex data types

42
5. R: complex data types
 Vectors: numerical vector, character vector, logical vector
 Matrices: all columns in a matrix must have the same mode (numeric,
character, ...) and the same length.
 Arrays: Arrays are similar to matrices but can have more than two dimensions.
 Lists: An ordered collection of objects (components). A list allows you to gather
a variety of (possibly unrelated) objects under one name.
 Data frames: more general than a matrix, in that different columns can have
different modes (numeric, character, ….). Like DB table
 Factors: Tell R that a variable is nominal by making it a factor. The factor stores
the nominal values as an integer vector in the range [ 1... k ] (where k is the
number of unique values in the nominal variable), and an internal vector of
character strings (the original values) mapped to these integers.
43
5.1 R complex data type: Vector
• A vector is a sequence of data elements of the same basic type.
Members in a vector are officially called components.
Nevertheless, we can just call them members.
• We use c() function to combine values into a vector. We can
construct different types of vectors
• Here is a vector containing three numeric values 2, 3 and 5.
> c(2, 3, 5)
[1] 2 3 5

44
R complex data type: Vector (cont.)
• And here is a vector of logical values.
> c(TRUE, FALSE, TRUE, FALSE, FALSE)
[1] TRUE FALSE TRUE FALSE FALSE
• A vector can contain character strings.
> c("aa", "bb", "cc", "dd", "ee")
[1] "aa" "bb" "cc" "dd" "ee"
• The number of members in a vector is given by the length function.
> length(c("aa", "bb", "cc", "dd", "ee"))
[1] 5

45
Combining Vectors
• Vectors can be combined via the function c. For examples, the following
two vectors n and s are combined into a new vector w containing
elements from both vectors.
• > n = c(2, 3, 5)
> s = c("aa", "bb", "cc", "dd", "ee")
> w = c(n, s)
[1] "2" "3" "5" "aa" "bb" "cc" "dd" "ee"
• Value Coercion
In the code snippet above, notice how the numeric values are being coerced into
character strings when the two vectors are combined. This is necessary so as to
maintain the same primitive data type for members in the same vector.
46
Vectors and vector operations
To create a vector: To access vector elements:
# c() command to create vector x # 2nd element of x
x=c(12,32,54,33,21,65) x[2]
# c() to add elements to vector x # first five elements of x
x=c(x,100,101) x[1:5]
# all but the 3rd element of x
# seq() command to create x[-3]
sequence of numbers conveniently # values of x that are < 40
years=seq(1990,2003) x[x<40]
# to contain in steps of .5 # Select all elements with values smaller
a=seq(3,5,.5) than 40; might be hard to understand
# can use : to step by 1
years=1990:2003;
To perform operations:
# rep() command to create data # mathematical operations on vectors
that follow a regular pattern
b=rep(1,5) y=c(3,2,4,3,7,6,1,1)
c=rep(1:2,4) x+y; 2*y; x*y; x/y; y^2
47
rep replicates the values for some times
Example: Vector Arithmetic
• Arithmetic operations of vectors are performed member-by-member, i.e., memberwise, e.g. suppose
we have two vectors a and b.
• > a = c(1, 3, 5, 7)
> b = c(1, 2, 4, 8)
• If we multiply a by 5, we get a vector with each of its members multiplied by 5.
• >5*a
[1] 5 15 25 35
• And if we add a and b together, the sum would be a vector whose members are the sum of the
corresponding members from a and b.
• >a+b
[1] 2 5 9 15
• Similarly for subtraction, multiplication and division, we get new vectors via memberwise operations.
• >a-b
[1] 0 1 1 -1
>a*b
[1] 1 6 20 56
>a/b
[1] 1.000 1.500 1.250 0.875 48
Access vector elements: Vector Index
• We retrieve values in a vector by declaring an index inside a single square
bracket "[]" operator.
• For example, the following shows how to retrieve a vector member. Since
the vector index is 1-based, we use the index position 3 for retrieving the
third member.
• > s = c("aa", "bb", "cc", "dd", "ee")
> s[3]
[1] "cc"
• Unlike other programming languages, the square bracket operator could
return more than just individual members. In fact, the result of the square
bracket operator is another vector, and s[3] is a vector slice (not element)
containing a single member "cc".
49
Vector Index (Cont.)
• Negative Index
• If the index is negative, it would strip the member whose position has
the same absolute value as the negative index. For example, the
following creates a vector slice with the third member removed.
• > s[-3] s = c("aa", "bb", "cc", "dd", "ee")

[1] "aa" "bb" "dd" "ee"


• Out-of-Range Index
• If an index is out-of-range, a missing value will be reported via the
symbol NA.
• > s[10]
[1] NA
50
Numeric Index Vector
• A new vector can be sliced from a given vector with a numeric index vector,
which consists of member positions of the original vector to be retrieved.
• Here it shows how to retrieve a vector slice containing the second and third
members of a given vector s.
• > s = c("aa", "bb", "cc", "dd", "ee")
> s[c(2, 3)]
[1] "bb" "cc" #s[2] ok; s[2,3] wrong
• Duplicate Indexes
• The index vector allows duplicate values. Hence the following retrieves a
member twice in one operation.
• > s[c(2, 3, 3)]
[1] "bb" "cc" "cc"
51
Numeric Index Vector (Cont.)
• Out-of-Order Indexes
• The index vector can even be out-of-order. Here is a vector slice with the order
of first and second members reversed.
> s[c(2, 1, 3)]
[1] "bb" "aa" "cc"
• Range Index
• To produce a vector slice between two indexes, we can use the colon operator
":". This can be convenient for situations involving large vectors.
> s[2:4]
[1] "bb" "cc" "dd"
• More information for the colon operator is available in the R documentation.
> help(":")
52
Logical Index Vector
• A new vector can be sliced from a given vector with a logical index vector, which has
the same length as the original vector. Its members are TRUE if the corresponding
members in the original vector are to be included in the slice, and FALSE if otherwise.
• For example, consider the following vector s of length 5.
> s = c("aa", "bb", "cc", "dd", "ee")
• To retrieve the second and fourth members of s, we define a logical vector L of the
same length, and have its second and fourth members set as TRUE.
• > L = c(FALSE, TRUE, FALSE, TRUE, FALSE)
> s[L]
[1] "bb" "dd"
• The code can be abbreviated into a single line.
> s[c(FALSE, TRUE, FALSE, TRUE, FALSE)]
[1] "bb" "dd"
53
Named Vector Members
• We can assign names to vector members. For example, the following variable v is a
character string vector with two members.
> v = c("Mary", "Sue")
>v
[1] "Mary" "Sue"
• We now name the first member as First, and the second as Last.
> names(v) = c("First", "Last")
>v
First Last
"Mary" "Sue"
• Then we can retrieve the first member by its name.
> v["First"]
[1] "Mary"
• Furthermore, we can reverse the order with a character string index vector
> v[c("Last", "First")]
Last First
"Sue" "Mary"
54
Recycling Rule
• If two vectors are of unequal length, the shorter one will be recycled in order
to match the longer vector. For example, the following vectors u and v have
different lengths, and their sum is computed by recycling values of the
shorter vector u.
• > u = c(10, 20, 30)
> v = c(1, 2, 3, 4, 5, 6, 7, 8, 9)
>u+v
[1] 11 22 33 14 25 36 17 28 39
Explanation of recycling u
u = c(10, 20, 30, 10, 20, 30, 10, 20, 30)
v = c(1, 2, 3, 4, 5, 6, 7, 8, 9)
55
5.2 R complex data type: Matrix
• A matrix is a collection of data elements arranged in a 2-D rectangular layout. An
matrix example with 2 rows and 3 columns.

• We reproduce a memory representation of the matrix with matrix function. The


data elements must be of the same basic type.
• > A = matrix(
+ c(2, 4, 3, 1, 5, 7), # the data elements
+ nrow=2, # number of rows
+ ncol=3, # number of columns
+ byrow = TRUE) # fill matrix by rows
>A # print the matrix
[,1] [,2] [,3]
[1,] 2 4 3
[2,] 1 5 7
56
R complex data type: Matrix
• An element at the mth row, nth column of A can be accessed by the
expression A[m, n].
• > A[2, 3] # element at 2nd row, 3rd column
[1] 7
• The entire mth row of A can be extracted as A[m, ].
• > A[2, ] # the 2nd row
[1] 1 5 7
• The entire nth column of A can be extracted as A[ ,n].
• > A[ ,3] # the 3rd column
[1] 3 7
57
R complex data type: Matrix
• We can also extract more than one rows or columns at a time.
• > A[ ,c(1,3)] # the 1st and 3rd columns
[,1] [,2]
[1,] 2 3
[2,] 1 7
• If we assign names to the rows and columns of the matrix, then we can access the elements by names.
• > dimnames(A) = list(
+ c("row1", "row2"), # row names
+ c("col1", "col2", "col3")) # column names

>A # print A
col1 col2 col3
row1 2 4 3
row2 1 5 7

> A["row2", "col3"] # element at 2nd row, 3rd column


[1] 7
58
Matrix Construction
• There are various ways to construct a matrix. When we construct a matrix
directly with data elements, the matrix content is filled along the column
orientation by default. For example, in the following code snippet, the
content of B is filled along the columns consecutively.
• > B=matrix(
+c(2, 4, 3, 1, 5, 7),
+nrow=3,
+ncol=2)

>B # B has 3 rows and 2 columns
[,1] [,2]
[1,] 2 1
[2,] 4 5
[3,] 3 7
59
Transpose & Deconstruction
• Transpose: We construct the transpose of a matrix by interchanging
its columns and rows with the function t
• > t(B) # transpose of B >B # B has 3 rows and 2 columns
[,1] [,2]
[,1] [,2] [,3] [1,] 2 1
[1,] 2 4 3 [2,] 4 5
[3,] 3 7
[2,] 1 5 7
• Deconstruction: We can deconstruct a matrix by applying the c
function, which combines all column vectors into one.
• > c(B)
[1] 2 4 3 1 5 7
60
Combining Matrices by columns
• The columns of two matrices having the same number of rows can be combined into
a larger matrix, e.g. suppose we have another matrix C also with 3 rows.
• > C = matrix(
+ c(7, 4, 2),
+ nrow=3,
+ ncol=1)
>C # C has 3 rows
[,1] >B # B has 3 rows and 2 columns
[,1] [,2]
[1,] 7 [1,] 2 1
[2,] 4 [2,] 4 5
[3,] 2 [3,] 3 7
• Then we can combine the columns of B and C with cbind.
• > cbind(B, C) #column bind
[,1] [,2] [,3]
[1,] 2 1 7
[2,] 4 5 4
[3,] 3 7 2 61
Combining Matrices by rows
• Similarly, we can combine the rows of two matrices if they have the same number of columns
with the rbind function.
• > D = matrix(
+ c(6, 2),
+ nrow=1,
+ ncol=2)

>D # D has 2 columns


[,1] [,2]
[1,] 6 2

> rbind(B, D) # row bind


[,1] [,2]
>B # B has 3 rows and 2 columns
[1,] 2 1
[,1] [,2]
[2,] 4 5 [1,] 2 1
[3,] 3 7 [2,] 4 5
[4,] 6 2 [3,] 3 7

62
Matrices & matrix operations
To create a matrix:
# matrix() command to create matrix A with rows and cols
A=matrix(c(54,49,49,41,26,43,49,50,58,71),nrow=5,ncol=2)
B=matrix(1,nrow=4,ncol=4)

To access matrix elements: Statistical operations:


# matrix_name[row_no, col_no] rowSums(A)
A[2,1] # 2nd row, 1st column element colSums(A)
A[3,] # 3rd row rowMeans(A)
A[,2] # 2nd column of the matrix colMeans(A)
A[2:4,c(2,1)] # submatrix of 2nd-4th row # max of each column
elements of the 2rd and 1st columns apply(A,2,max) #2 indicates columns
A["KC",] # access row by name, "KC" # min of each row
apply(A,1,min) #1 indicates rows

Element by element ops: Matrix/vector multiplication:


2*A+3; A+B; A*B; A/B; A %*% B;

63
Useful more functions for vectors and matrices
• Find # of elements or dimensions
• length(v), length(A), dim(A)
• Transpose
• t(v), t(A)
• Matrix inverse
• solve(A)
• solve(A, b): returns vector x in the equation b = Ax (i.e., A-1b)
• Sort vector values
• sort(v)
• Statistics
• min(), max(), mean(), median(), sum(), sd(), quantile()
• Treat matrices as a single vector (same with sort()) 64
5.1, 5.2 Summary of Vector and Matrix
• Vector (members have same type)
– N =c(12,32,54,33,21,65) #numerical vector,
– C = c(TRUE, FALSE, TRUE, FALSE, FALSE) #logical vector
– L =c("aa", "bb", "cc", "dd", "ee") # character vector
• Matrix (same type, same length)
– A = matrix(c(2, 4, 3, 1, 5, 7), nrow=2, ncol=3, byrow = TRUE)
– A = matrix(c('1','2','3','4'), nrow=2, ncol=2)

– Y = matrix(1:20, nrow=5,ncol=4)

65
Arrays
• Arrays are similar to matrices but can have more than two
dimensions. An n-dimensional array is a set of stacked matrices of
identical dimensions. For example, we create a 3-d array with 4
matrices (each 2*3 matrix)
• a <- matrix(6, 2, 3) # 2 x 3 matrix
• b <- matrix(7, 2, 3) # 2 x 3 matrix
• c <- matrix(8, 2, 3) # 2 x 3 matrix
• d <- matrix(9, 2, 3) # 2 x 3 matrix
• myarray=array(c(a, b, c, d), c(2, 3, 4))
# Creates a 2 x 3 x 4 array
66
4 matrices Creates a 2 x 3 x 4 array
(each 2*3 matrix)

67
Array example
• myarray1 <- array(1:24, dim=c(3,4,2))
• myarray1
• Here the data as the first argument and a vector with the sizes of the
dimensions as the second argument. Our array has 3 rows, 4 columns,
and 2 “tables” :
Access Array elements. Format: Array[row, col, matrix]
# 3rd row of the second matrix:
myarray1[3,,2]
# 1st row and 3rd column of the 1st matrix:
myarray1[1,3,1]

# 2nd Matrix
myarray1[,,2]

#apply() function below to calculate the sum of the elements in the rows of an array
across all the matrices.
result <- apply(myarray1, c(1), sum) # 1 indicates rows; 2 indicates columns; c(1, 2)
indicates rows and columns.

68
5.3 List
• A list is a generic vector containing different objects (different types,
different length), e.g., the following variable x is a list containing copies of
3 vectors n, s, b, and a numeric value 3.
• > n = c(2, 3, 5) n>
> s = c("aa", "bb", "cc", "dd", "ee") [1] 2 3 5
> b = c(TRUE, FALSE, TRUE, FALSE, FALSE)
> x = list(n, s, b, 3)
• >x [[1]]
[1] 2 3 5
[[2]]
[1] "aa" "bb" "cc" "dd" "ee" Objects using double square bracket [[ ]]
[[3]]
[1] TRUE FALSE TRUE FALSE FALSE
[[4]]
[1] 3

69
List Slicing: Still get a list
• We retrieve a list slice with the single square bracket "[]" operator. [] extracts
a list.
• > x[2]
[[1]]
[1] "aa" "bb" "cc" "dd" "ee"
• >class (x[2]) #list X[2] is a list, not an actual member

• With an index vector, we can retrieve a slice with multiple objects. Here a
slice containing the second and fourth objects of x.
• > x[c(2, 4)] X>
[[1]] [[1]]

[1] "aa" "bb" "cc" "dd" "ee"


[1] 2 3 5
[[2]]
[1] "aa" "bb" "cc" "dd" "ee"
[[2]] [[3]]
[1] TRUE FALSE TRUE FALSE FALSE
[1] 3 [[4]]
[1] 3
70
Member Reference: access members
• In order to reference a list member directly, we have to use the double square bracket
"[[]]" operator. The following object x[[2]] is the second member of x. [[]] extracts
vector within a list.
• > x[[2]]
[1] "aa" "bb" "cc" "dd" "ee"
• >class (x[[2]])
X[[2]] is NOT a list. It is a character array
• [1] "character"
• We can show or modify its (member) content directly. Access List object element
> x[[2]][3]
• > x[[2]][1] 'cc’
> x[[2]][1] = "ta" > class(x[[2]][1])
> x[[2]] 'character’
> x[[3]][3]
[1] "ta" "bb" "cc" "dd" "ee" TRUE
>s > class(x[[3]][3])
[1] "aa" "bb" "cc" "dd" "ee" # s is unaffected; we only changed x 'logical'
71
Named List Members
• We can assign names to list members, and reference them by
names instead of numeric indexes.
• For example, in the following, v is a list of two objects/members,
named “Bob" and “Mary".
• > v = list(Bob=c(2, 3, 5), Mary=c("aa", "bb"))
>v
$Bob
[1] 2 3 5

$ Mary
[1] "aa" "bb"
72
List Slicing by Name: Still get a list
• We retrieve a list slice with the single square bracket "[]" operator. Here is a list slice
containing a member of v named “Bob".
• >v["Bob"]
$Bob
[1] 2 3 5
>class(v["Bob"]) #list
• With an index vector, we can retrieve a slice with multiple members. Here is a list
slice with both objects of v. Notice how they are reversed from their original
positions in v.
• > v[c("Mary","Bob")]
$Mary
[1] "aa" "bb"

$Bob
[1] 2 3 5
73
Member Reference: access members
• In order to reference a list member directly, we have to use the
double square bracket "[[]]" operator. The following references a
member of v by name.
• > v[["Bob"]]
[1] 2 3 5
• class(v["Bob"]) #list
• class(v[["Bob"]]) #numeric
• A named list member can also be referenced directly with the "$"
operator in lieu of the double square bracket operator.
• > v$Bob For list, What is the difference
between [], [[]], [[]][], and $ ?
[1] 2 3 5
• class(v$Bob) #numeric []: list; [[]] and $: vector; [[]][]: member
74
Homework
• 1. Test out two programming interfaces
• 2. Read machine learning package at CRAN package distribution,
to have a rough understanding on its functions (note you may
not understand the meaning of all the topics).
• 3. Run and understand the Basic R program (BasicR_RStudio.zip
that maps to the today’s lecture.
• Read thru the article https://data-flair.training/blogs/machine-
learning-for-r-programming/#google_vignette
Practice makes perfect!

75
Useful R links
• R Home: http://www.r-project.org/
• R’s CRAN package distribution: https://cran.r-project.org/, CLICK Packages
and CRAN Task Views: study
• More comprehensive than our lectures: An Introduction to R Notes on R:
A Programming Environment for Data Analysis and Graphics Version 4.2.1
(2022-06-23) : https://cran.r-project.org/doc/manuals/r-release/R-
intro.pdf
• Writing R extensions: https://cran.r-project.org/doc/manuals/R-exts.html
• Other R documentation:
• http://www.r-tutor.com/r-introduction
• http://www.tutorialspoint.com/r/ 76
Contact: xlli@ntu.edu.sg, xlli@i2r.a-star.edu.sg if you have questions

77

You might also like