KEMBAR78
R Lang-Unit-01 | PDF | R (Programming Language) | Computer Programming
100% found this document useful (1 vote)
113 views50 pages

R Lang-Unit-01

The document provides an introduction to the R programming language, highlighting its features, advantages, and applications in statistical computing and data analysis. It covers basic concepts such as data types, variable creation, and mathematical functions, along with examples of coding in R. Additionally, it discusses the significance of R in the data science job market and its integration capabilities with other programming languages.

Uploaded by

km587522
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
113 views50 pages

R Lang-Unit-01

The document provides an introduction to the R programming language, highlighting its features, advantages, and applications in statistical computing and data analysis. It covers basic concepts such as data types, variable creation, and mathematical functions, along with examples of coding in R. Additionally, it discusses the significance of R in the data science job market and its integration capabilities with other programming languages.

Uploaded by

km587522
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Nrupathunga University

Department of Computer Science


V Sem BCA (NEP)
Statistical Computing and R Programming Language
Unit -01
Introduction of the language, Numbers, Arithmetic, assignment, Vectors, Matrices and Arrays,
Non-numeric Values, Lists and Data Frames, Special Values, Classes, and Coercion, Basic
Plotting.

1. Introduction of the Language:


R is an open-source programming language that is widely used as a statistical software
and data analysis tool. R generally comes with the Command-line interface. R is available
across widely used platforms like Windows, Linux, and macOS. Also, the R programming
language is the latest cutting-edge tool.
It was designed by Ross Ihaka and Robert Gentleman at the University of Auckland,
New Zealand, and is currently developed by the R Development Core Team. R programming
language is an implementation of the S programming language. It also combines with lexical
scoping semantics inspired by Scheme. Moreover, the project was conceived in 1992, with an
initial version released in 1995 and a stable beta version in 2000.

1.1 Why R Programming Language?

1
• R programming is used as a leading tool for machine learning, statistics, and
data analysis. Objects, functions, and packages can easily be created by R.
• It’s a platform-independent language. This means it can be applied to all
operating system.

• It’s an open-source free language. That means anyone can install it in any
organization without purchasing a license.

• R programming language is not only a statistic package but also allows us


to integrate with other languages (C, C++). Thus, you can easily interact
with many data sources and statistical packages.

• The R programming language has a vast community of users and it’s


growing day by day.

• R is currently one of the most requested programming languages in the


Data Science job market which makes it the hottest trend nowadays.

1.2 Features of R Programming Language

a. Statistical Features of R:

• Basic Statistics: The most common basic statistics terms are the mean, mode, and
median. These are all known as “Measures of Central Tendency.” So using the R
language we can measure central tendency very easily.

2
• Static graphics: R is rich with facilities for creating and developing interesting static
graphics. R contains functionality for many plot types including graphic maps,
mosaic plots, biplots, and the list goes on.

• Probability distributions: Probability distributions play a vital role in statistics and


by using R we can easily handle various types of probability distribution such as
Binomial Distribution, Normal Distribution, Chi-squared Distribution and many
more.

• Data analysis: It provides a large, coherent and integrated collection of tools for
data analysis.

b. Programming Features of R:

• R Packages: One of the major features of R is it has a wide availability of libraries.


R has CRAN(Comprehensive R Archive Network), which is a repository holding
more than 10, 0000 packages.

• Distributed Computing: Distributed computing is a model in which components of


a software system are shared among multiple computers to improve efficiency and
performance. Two new packages ddR and multidplyr used for distributed
programming in R were released in November 2015.

1.3 Programming in R:

Since R is much similar to other widely used languages syntactically, it is easier


to code and learn in R. Programs can be written in R in any of the widely used IDE like
R,R Studio, Rattle, Tinn-R, etc.

1.4 Advantages of R:

• R is the most comprehensive statistical analysis package. As new technology and


concepts often appear first in R.

• As R programming language is an open source. Thus, you can run R anywhere and
at any time.

• R programming language is suitable for GNU/Linux and Windows operating system.

• R programming is cross-platform which runs on any operating system.

• In R, everyone is welcome to provide new packages, bug fixes, and code


enhancements.

3
1.5 Disadvantages of R:

• In the R programming language, the standard of some packages is less than perfect.

• Although, R commands give little pressure to memory management. So R


programming language may consume all available memory.

• In R basically, nobody to complain if something doesn’t work.

• R programming language is much slower than other programming languages such


as Python and MATLAB.

1.6 Applications of R:

• We use R for Data Science. It gives us a broad variety of libraries related to statistics.
It also provides the environment for statistical computing and design.

• R is used by many quantitative analysts as its programming tool. Thus, it helps in


data importing and cleaning.

• R is the most prevalent language. So many data analysts and research programmers
use it. Hence, it is used as a fundamental tool for finance.

• Tech giants like Google, Facebook, bing, Twitter, Accenture, Wipro and many more
using R nowadays.

2. Comments

Comments can be used to explain R code, and to make it more readable. It can
also be used to prevent execution when testing alternative code. Starts with a #.
When executing code, R will ignore anything that starts with #. This example uses a
comment before a line of code:

Example
# This is a comment
"Hello World!"

Multiline Comments

Unlike other programming languages, such as Java, there are no syntax in R for
multiline comments. However, we can just insert a # for each line to create multiline
comments:

Example
# This is a comment
# written in

4
# more than just one line
"Hello World!"

3. Creating Variables in R
Variables are containers for storing data values.

R does not have a command for declaring a variable. A variable is created the
moment you first assign a value to it. To assign a value to a variable, use the <- sign. To
output (or print) the variable value, just type the variable name:

Example
name <- "John"
age <- 40
name # output "John" age
# output 40

3.1 Concatenate Elements

You can also concatenate, or join, two or more elements, by using the
paste() function.

To combine both text and a variable, R uses comma (,):

Example 1

Text<- "awesome"
paste("R is",
text) Example 2

text1<-"Ris"
text2<- "awesome"
paste(text1,
text2)

Example 3

For numbers, the + character works as a mathematical operator:

num1<- 5
num2<- 10
num1 + num2

Example 4

If you try to combine a string (text) and a number, R will give you an error:

5
num<- 5 text<-
"Sometext"
num +
text

Error in num + text : non-numeric argument to binary operator

3.2 Multiple Variables

R allows you to assign the same value to multiple variables in one line:

Example
# Assign the same value to multiple variables in one line

var1<-var2<- var3<- "Orange"

#Print
var1 var2
var3

3.3 Variable Names


A variable can have a short name (like x and y) or a more descriptive name
(age, carname, total volume). Rules for R variables are:

• A variable name must start with a letter and can be a combination of letters,
digits, period(.) and underscore(_). If it starts with period(.), it cannot be followed
by a digit.
• A variable name cannot start with a number or underscore (_)
• Variable names are case-sensitive (age, Age and AGE are three different variables)
• Reserved words cannot be used as variables (TRUE, FALSE, NULL, if...)

Example

#Legal variable names

Myvar<- "John"
my_var<- "John" myVar
<- "John" MYVAR<-
"John" myvar2<-
"John" .myvar<-
"John"

#Ilegal variable names

2myvar<- "John" my-


var<- "John" my^var<-
"John" _my_var<-

6
"John" my_v@ar<-
"John" TRUE <- "John"

4. Data Types

In programming, data type is an important concept.

Variables can store data of different types, and different types can do different
things.

In R, variables do not need to be declared with any particular type, and can
even change type after they have been set:

Example
my_var <- 30 # my_var is type of numeric my_var <- "Sally" #
my_var is now of type character (aka string)
Basic data types in R can be divided into the following types:

• numeric - (10.5, 55, 787)


• integer - (1L, 55L, 100L, where the letter "L" declares this as an integer)
• complex - (9 + 3i, where "i" is the imaginary part)
• character (a.k.a. string) - ("k", "R is exciting", "FALSE", "11.5")
• logical (a.k.a. boolean) - (TRUE or FALSE)

Example

# numeric
x <- 10.5
class(x)
# integer
x <- 1000L
class(x)
# complex
x <- 9i +
3 class(x)

# character/string
x <- "R is
exciting" class(x)

# logical/boolean
x <- TRUE
class(x)

5. Numbers

7
There are three number types in R:

a) Numeric
b) Integer
c) Complex

Variables of number types are created when you assign a value to them:

Example
x <- 10.5 # numeric y
<- 10L # integer z
<- 1i # complex

a. Numeric

A numeric data type is the most common type in R, and contains any number
with or without a decimal, like: 10.5, 55, 787:

Example
x <- 10.5 y
<- 55

# Print values of x and y


x y

# Print the class name of x and y


class(x) class(y)

b. Integer

Integers are numeric data without decimals. This is used when you are certain
that you will never create a variable that should contain decimals. To create an integer
variable, you must use the letter L after the integer value:

Example
x <- 1000L y
<- 55L

# Print values of x and y


x y

8
# Print the class name of x and y

class(x) class(y)

c. Complex

A complex number is written with an "i" as the imaginary part:

Example
x <- 3+5i y
<- 5i

# Print values of x and y x


y

# Print the class name of x and y


class(x) class(y)

** Type Conversion

You can convert from one type to another with the following functions:

• as.numeric()
• as.integer()
• as.complex()

Example
x <- 1L # integer y
<- 2 # numeric

# convert from integer to numeric: a


<- as.numeric(x)

# convert from numeric to integer: b


<- as.integer(y)

# print values of x and y


x y

# print the class name of a and b


class(a) class(b)

6. Math Functions

9
In R, you can use operators to perform common mathematical operations on
numbers.

The + operator is used to add together two values:

10 + 5
10 – 5
6.1 Built-in Math Functions
R also has many built-in math functions that allows you to perform
mathematical tasks on numbers.

1. For example, the min() and max() functions can be used to find the lowest or
highest number in a set:

max(5, 10, 15)


min(5, 10,
15)

2. The sqrt() function returns the square root of a number:

sqrt(16)

3. The abs() function returns the absolute (positive) value of a number:

abs(-4.7)

4. The ceiling() function rounds a number upwards to its nearest integer, and the
floor() function rounds a number downwards to its nearest integer, and
returns the result:

ceiling(1.4)

floor(1.4)

7. STRING Functions

strings are used for storing text.

A string is surrounded by either single quotation marks, or double quotation


marks:

"hello" is the same as 'hello':

10
• Assigning a string to a variable is done with the variable followed by the <-
operator and the string:

str <- "Hello"


str # print the value of str

• You can assign a multiline string to a variable like this:

str <- "Lorem ipsum dolor sit amet,


consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut
labore et dolore magna aliqua."

str # print the value of str

• However, note that R will add a "\n" at the end of each line break. This is
called an escape character, and the n character indicates a new line.

If you want the line breaks to be inserted at the same position as in the
code, use the cat() function:

str <- "Lorem ipsum dolor sit amet,


consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut
labore et dolore magna aliqua."

cat(str)

8. Escape Characters

To insert characters that are illegal in a string, you must use an escape
character.

An escape character is a backslash \ followed by the character you want to


insert.

An example of an illegal character is a double quote inside a string that is


surrounded by double quotes:

Example

str <- "We are the so-called "Vikings", from the north."
str
Error: unexpected symbol in "str <- "We are the so-called
"Vikings"

11
To fix this problem, use the escape character \":

The escape character allows you to use double quotes when you normally would not
be allowed: Example

str <- "We are the so-called \"Vikings\", from the north."

str cat(str)
Note that auto-printing the str variable will print the backslash in the output. You
can use the cat() function to print it without backslash.

Other escape characters in R:

Code Result

\\ Backslash

\n New Line

\r Carriage Return

\t Tab

\b Backspace

9. Booleans/Logical Values

In programming, you often need to know if an expression is true or false.

You can evaluate any expression in R, and get one of two answers,
TRUE or FALSE.

When you compare two values, the expression is evaluated and R returns the logical
answer:

Example

10 > 9 # TRUE because 10 is greater than 9


10 == 9 # FALSE because 10 is not equal to 9
10 < 9 # FALSE because 10 is greater than 9

You can also compare two variables: Example

a <- 10
b <- 9
a > b

You can also run a condition in an if statement Example

12
a <- 200 b
<- 33 if (b
> a) {
print ("b is greater than a")
} else {
print("b is not greater than a")
}

10. Operators

Operators are used to perform operations on variables and values.

In the example below, we use the + operator to add together two values:

10 + 5

R divides the operators in the following groups:

1. Arithmetic operators
2. Assignment operators
3. Comparison operators
4. Logical operators
5. Miscellaneous operators

a. Arithmetic operators

Arithmetic operators are used with numeric values to perform common


mathematical operations:

13
b. Assignment operators

Assignment operators are used to assign values to variables:

Example

my_var <- 3

my_var <<- 3
3 -> my_var

3 ->> my_var
my_var # print
my_var

Note: <<- is a global assigner. You will learn more about this in the Global Variable
chapter.It is also possible to turn the direction of the assignment operator. x <- 3 is
equal to 3 -> x

c. Comparison operators

Comparison operators are used to compare two values:

14
d. Logical operators

Logical operators are used to combine conditional statements:

e. Miscellaneous operators

Miscellaneous operators are used to manipulate data:

11.Vectors

15
A vector is simply a list of items that are of the same type.

To combine the list of items into a vector, use the c() function and separate the items by a
comma.

11.1 Vectors of String

In the example below, we create a vector variable called fruits, that combine strings:

Example

# Vector of strings
fruits <- c("banana", "apple", "orange")

# Print fruits fruits

[1] "banana" "apple" "orange"


11.2 Vectors of Numerical Values

In this example, we create a vector that combines numerical values:

Example

# Vector of numerical values


numbers <- c(1, 2, 3)

# Print numbers

numbers

[1] 1 2 3

11.3 Vectors of Numerical Values in Sequence

To create a vector with numerical values in a sequence, use the : operator:

Example

# Vector with numerical values in a sequence


numbers <- 1:10
numbers

[1] 1 2 3 4 5 6 7 8 9 10

11.4 Vectors of Decimal values in sequence

You can also create numerical values with decimals in a sequence, but note that if
the last element does not belong to the sequence, it is not used:

16
Example

# Vector with numerical decimals in a sequence


numbers1 <- 1.5:6.5 numbers1

# Vector with numerical decimals in a sequence where the last


element is not used
numbers2 <- 1.5:6.3 numbers2
[1] 1.5 2.5 3.5 4.5 5.5 6.5
[1] 1.5 2.5 3.5 4.5 5.5
11.5 Vectors of Logical Values

In the example below, we create a vector of logical values:


Example

# Vector of logical values


log_values <- c(TRUE, FALSE, TRUE, FALSE)
log_values

[1] TRUE FALSE TRUE FALSE

11.6 Vector Length

To find out how many items a vector has, use the length() function:

Example

fruits <- c("banana", "apple", "orange")


length(fruits)

[1] 3

11.7 Sort Vector

To sort items in a vector alphabetically or numerically, use the sort() function:

Example

fruits <- c("banana", "apple", "orange", "mango", "lemon") numbers


<- c(13, 3, 5, 7, 20, 2)

sort(fruits) # Sort a string


sort(numbers) # Sort numbers

[1] "apple" "banana" "lemon" "mango" "orange"


[1] 2 3 5 7 13 20

11.8 Access Vector

17
You can access the vector items by referring to its index number inside brackets [].
The first item has index 1, the second item has index 2, and so on:

Example

fruits <- c("banana", "apple", "orange")

# Access the first item (banana)

fruits[1]

[1] "banana"

You can also access multiple elements by referring to different index positions with
the c() function:

Example

fruits <- c("banana", "apple", "orange", "mango", "lemon")

# Access the first and third item (banana and orange)

fruits[c(1, 3)]

[1] "banana" "orange"

You can also use negative index numbers to access all items except the ones
specified:

Example

fruits <- c("banana", "apple", "orange", "mango", "lemon")

# Access all items except for the first item

fruits[c(-1)]

[1] "apple" "orange" "mango" "lemon"

11.9 Change an item

To change the value of a specific item, refer to the index number:

fruits <- c("banana", "apple", "orange", "mango", "lemon")

18
# Change "banana" to "pear"
fruits[1] <- "pear"

# Print fruits
fruits

[1] "pear" "apple" "orange" "mango" "lemon"

11.10 Repeat Vectors

a. To repeat vectors, use the rep() function:

Example

repeat_each <- rep(c(1,2,3), each = 3)


repeat_each
[1] 1 1 1 2
2 2 3 3 3

b. Repeat the sequence of the vector:

Example repeat_times <- rep(c(1,2,3), times =

3) repeat_times

[1] 1 2 3 1 2 3 1 2 3

c. Repeat each value independently:

Example

repeat_indepent <- rep(c(1,2,3), times = c(5,2,1))

repeat_indepent

[1] 1 1 1 1 1 2 2 3

11.11 Generating Sequenced vectors

One of the examples on top, showed you how to create a vector with
numerical values in a sequence with the : operator:

Example

numbers <- 1:10


numbers

To make bigger or smaller steps in a sequence, use the seq() function:

Example

19
numbers <- seq(from = 0, to = 100, by = 20)
numbers

[1] 0 20 40 60 80 100

The seq() function has three parameters: from is where the sequence starts, to is
where the sequence stops, and by is the interval of the sequence.

20
12. Lists
A list in R can contain many different data types inside it.
A list is a collection of data which is ordered and changeable.
To create a list, use the list() function.

Example
1.List containing data with same data type.

# List of strings
thislist <- list ("apple", "banana", "cherry")

# Print the list


thislist

[[1]]
[1]"apple"
[[2]]
[1]"banana"
[[3]]
[1] "cherry"

2. List containing data with different data types.

thislist1 <- list (10.1 , “a” , 100 , “#”)

#Print the list

thislist1

[[1]]

[1] 10.1

[[12]]

[1] "a"

21
[[3]]

[1] 100

[[4]]

[1] "#"

12.1 Access Lists

You can access the list items by referring to its index number, inside
brackets. The first item has index 1, the second item has index 2, and so
on:

Example
thislist <- list ("apple", "banana", "cherry")

thislist [1]

[[1]]

[1] "apple"

12.2 Change Item Value


To change the value of a specific item, refer to the index number:

Example
thislist <- list("apple" ,"banana", "cherry" )

thislist[1] <- "blackcurrant"

# Print the updated list

thislist

22
[[1]]
[1]"blackcurrant"
[[2]]
[1]"banana"
[[3]]
[1] "cherry"

12.3 List Length


To find out how many items a list has, use the length() function: Example
thislist <- list("apple" ,"banana", "cherry" )

length(thislist)

[1] 3

12.4 Check if Item Exists


To find out if a specified item is present in a list, use the %in% operator:

Example

Check if "apple" is present in the list:

thislist <- list("apple" ,"banana", "cherry" )

"apple" %in% thislist

[1] TRUE

12.5 Add List Items

To add an item to the end of the list, use the append() function:

12.5.1 Example

23
Add "orange" to the list:

thislist <- list("apple" ,"banana", "cherry" )

append (thislist, "orange")

[[1]]
[1]"apple"
[[2]]
[1]"banana"
[[3]]
[1]"cherry"
[[4]]
[1] "orange"

12.5.2 Example

To add an item to the right of a specified index, add "after=index number"


in the append() function:

Add "orange" to the list after "banana" (index 2):


thislist <- list("apple" ,"banana", "cherry" )

append (thislist, "orange", after = 2)

[[1]]
[1]"apple"
[[2]]
[1]"banana"
[[3]]
[1]"orange"
[[4]]
[1] "cherry"

24
12.6 Remove List Items
You can also remove list items. The following example creates a new,
updated list without an "apple" item:

Example
Remove "apple" from the list:

thislist <- list("apple", "banana", "cherry" )

newlist <- thislist[-1]

# Print the new list

newlist

[[1]]
[1]"banana"
[[2]]
[1] "cherry"

12.7 Range of Indexes


You can specify a range of indexes by specifying where to start and where
to end the range, by using the: operator:

Example

Return the second, third, fourth and fifth item:


thislist <- list("apple", "banana", "cherry", "orange", "kiwi", "melon", "mango")

(thislist) [2:5]

[[1]]
[1]"banana"
[[2]]
[1]"cherry" [[3]]
[1]"orange"

25
[[4]]
[1] "kiwi"

Note: The search will start at index 2 (included) and end at index 5 (included).

Remember that the first item has index 1.

12.8 Loop Through a List


You can loop through the list items by using a for loop:

Example

Print all items in the list, one by one:

thislist <- list("apple", "banana", "cherry")

for (x in thislist){

print(x)
}

[1]"apple"
[1]"banana"
[1] "cherry"

12.9 Join Two Lists


There are several ways to join, or concatenate, two or more lists in R.

The most common way is to use the c () function, which combines two
elements together:

Example
#Join Two Lists

26
list1 <- list ("a", "b","c")

list2 <- list (1,2,3)

list3 <- c (list1, list2)

list3

[[1]]
[1] "a"
[[2]]
[1] "b" [[3]]
[1] "c"
[[4]]
[1] 1
[[5]]
[1] 2
[[6]]
[1] 3

13. R Matrices
A matrix is a two dimensional data set with columns and rows.

A column is a vertical representation of data, while a row is a horizontal


representation of data.

A matrix can be created with the matrix() function. Specify the nrow and
ncol parameters to get the amount of rows and columns:

Example
# Create a matrix

thismatrix <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2)

# Print the matrix

thismatrix

[,1] [,2]

27
[1,] 1 4
[2,] 2 5
[3,] 3 6

Note: Remember the c() function is used to concatenate items together.

You can also create a matrix with strings:

Example
thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)

thismatrix

[,1] [,2]
[1,] "apple" "cherry"
[2,] "banana" "orange"

13.1 Access Matrix Items


You can access the items by using [ ] brackets. The first number "1" in the
bracket specifies the row-position, while the second number "2" specifies the
column-position:

Example

thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)

thismatrix[1,2]

[1] "cherry"

The whole row can be accessed if you specify a comma after the number in the bracket:

Example
thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)

thismatrix[2,]

28
[1] "banana" "orange"

The whole column can be accessed if you specify a comma before the number in the bracket:

Example
thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)
thismatrix[,2]

[1] "cherry" "orange"

13.2 Access More Than One Row


More than one row can be accessed if you use the c() function:
Example

thismatrix <- matrix(c("apple", "banana", "cherry", "orange","grape", "pineapple",

"pear", "melon", "fig"), nrow = 3, ncol = 3)

thismatrix[c(1,2),]

[,1] [,2] [,3]


[1,] "apple" "orange" "pear"
[2,] "banana" "grape" "melon"

13.3 Access More Than One Column


More than one column can be accessed if you use the c() function:

Example

thismatrix <- matrix(c("apple", "banana", "cherry", “orange","grape", "pineapple", "pear",

"melon", "fig"), nrow = 3, ncol = 3)

thismatrix[, c(1,2)]

[,1] [,2]

29
[1,] "apple" "orange"
[2,] "banana" "grape"
[3,] "cherry" "pineapple"

13.4 Add Rows and Columns


Use the cbind() function to add additional columns in a Matrix:

Example

thismatrix <- matrix(c("apple", "banana", "cherry", "orange","grape", "pineapple", "pear",

"melon", "fig"), nrow = 3, ncol = 3)

newmatrix <- cbind(thismatrix, c("strawberry", "blueberry", "raspberry"))

# Print the new matrix

newmatrix

[,1] [,2] [,3] [,4]


[1,] "apple" "orange" "pear" "strawberry"
[2,] "banana" "grape" "melon" "blueberry"
[3,] "cherry" "pineapple" "fig" "raspberry" Note: The cells in the new
column must be of the same length as the existing matrix.

Note: The cells in the new column must be of the same length as the existing
matrix.

Use the rbind() function to add additional rows in a Matrix:

Example

thismatrix <- matrix(c("apple", "banana", "cherry", "orange","grape", "pineapple", "pear",

"melon", "fig"), nrow = 3, ncol = 3)

newmatrix <- rbind(thismatrix, c("strawberry", "blueberry", "raspberry"))

# Print the new matrix

newmatrix

30
[,1] [,2] [,3]
[1,] "apple" "orange" "pear"
[2,] "banana" "grape" "melon"
[3,] "cherry" "pineapple" "fig"
[4,] "strawberry" "blueberry" "raspberry"

Note: The cells in the new row must be of the same length as the existing
matrix.

13.5 Remove Rows and Columns


Use the c() function to remove rows and columns in a Matrix:

Example

thismatrix <- matrix(c("apple", "banana", "cherry", "orange", "mango", "pineapple"), nrow =

3, ncol =2)

#Remove the first row and the first column

thismatrix <- thismatrix[-c(1), -c(1)]

thismatrix

[1] "mango" "pineapple"

13.6 Check if an Item Exists


To find out if a specified item is present in a matrix, use the %in% operator:

Example

Check if "apple" is present in the matrix:

thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)

"apple" %in% thismatrix

31
[1] TRUE

13.6 Number of Rows and Columns


Use the dim() function to find the number of rows and columns in a Matrix:

Example

thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)

dim(thismatrix)

[1] 2 2

13.7 Matrix Length


Use the length() function to find the dimension of a Matrix:

Example
thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)

length(thismatrix)

[1] 4

Total cells in the matrix is the number of rows multiplied by number of columns.

In the example above: Dimension = 2*2 = 4.

13.8 Loop Through a Matrix


You can loop through a Matrix using a for loop. The loop will start at the first row,
moving right:

32
Example

Loop through the matrix items and print them:

thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)

for (rows in 1:nrow(thismatrix)) { for

(columns in 1:ncol(thismatrix)) {

print(thismatrix[rows, columns])

[1]"apple"
[1]"cherry"
[1]"banana"
[1]"orange"

13.9 Combine two Matrices


Again, you can use the rbind() or cbind() function to combine two or more matrices
together:

Example

# Combine matrices

Matrix1 <- matrix(c("apple", "banana", "cherry", "grape"), nrow = 2, ncol = 2)

Matrix2 <- matrix(c("orange", "mango", "pineapple", "watermelon"), nrow = 2, ncol = 2)

# Adding it as a rows

Matrix_Combined <- rbind(Matrix1, Matrix2)

33
Matrix_Combined

# Adding it as a columns

Matrix_Combined <- cbind(Matrix1, Matrix2)

Matrix_Combined

[,1] [,2]
[1,] "apple" "cherry"
[2,] "banana" "grape"
[3,] "orange" "pineapple"
[4,] "mango" "watermelon"
[,1] [,2] [,3] [,4]
[1,] "apple" "cherry" "orange" "pineapple"
[2,] "banana" "grape" "mango" "watermelon"

14. Arrays
Compared to matrices, arrays can have more than two
dimensions.

We can use the array() function to create an array, and the dim
parameter to specify the dimensions:

Example
# An array with one dimension with values ranging from 1 to 24

thisarray <- c(1:24)

thisarray

# An array with more than one dimension

multiarray <- array(thisarray, dim = c(4, 3, 2))

multiarray

34
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
23 24
, , 1

[,1] [,2] [,3]


[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12

, , 2

[,1] [,2] [,3]


[1,] 13 17 21
[2,] 14 18 22
[3,] 15 19 23
[4,] 16 20 24

Example Explained
In the example above we create an array with the values 1 to 24.

How does dim=c(4,3,2) work?


The first and second number in the bracket specifies the amount of rows
and columns.
The last number in the bracket specifies how many dimensions we
want.

Note: Arrays can only have one data type.

14.1 Access Array Items


You can access the array elements by referring to the index position. You
can use the [] brackets to access the desired elements from an array:

Example
thisarray <- c(1:24)

multiarray <- array(thisarray, dim = c(4, 3, 2))

multiarray[2,3,2]

35
[1] 22

The syntax is as follow: array[row position, column position, matrix level]


You can also access the whole row or column from a matrix in an array, by using the c()
function:

Example
thisarray <- c(1:24)

# Access all the items from the first row from matrix one

multiarray <- array(thisarray, dim = c(4, 3, 2)) multiarray[c(1),,1]

# Access all the items from the first column from matrix one
multiarray <- array(thisarray, dim = c(4, 3, 2)) multiarray[,c(1),1]

[1] 1 5 9

[1] 1 2 3 4

A comma (,) before c() means that we want to access the column.

A comma (,) after c() means that we want to access the row.

14.2 Check if an Item Exists


To find out if a specified item is present in an array, use the %in% operator:

Example

Check if the value "2" is present in the array:


thisarray <- c(1:24)

multiarray <- array(thisarray, dim = c(4, 3, 2))

2 %in% multiarray

36
[1] TRUE

14.3 Amount of Rows and Columns


Use the dim() function to find the amount of rows and columns in an array:

Example
thisarray <- c(1:24)

multiarray <- array(thisarray, dim = c(4, 3, 2))

dim(multiarray)

[1] 4 3 2

14.4 Array Length


Use the length() function to find the dimension of an array:

Example
thisarray <- c(1:24)

multiarray <- array(thisarray, dim = c(4, 3, 2))

length(multiarray)

[1] 24

14.5 Loop Through an Array


You can loop through the array items by using a for loop:

Example
thisarray <- c(1:24)

37
multiarray <- array(thisarray, dim = c(4, 3, 2))

for(x in multiarray){ print(x)

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14
[1] 15
[1] 16
[1] 17
[1] 18
[1] 19
[1] 20
[1] 21
[1] 22
[1] 23
[1] 24

38
15. R Data Frames
Data Frames are data displayed in a format as a table.

Data Frames can have different types of data inside it. While the
first column can be character, the second and third can be
numeric or logical. However, each column should have the
same type of data.

Use the data.frame() function to create a data frame:

Example
# Create a data frame
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# Print the data frame

Data_Frame

Output:-
1 Training Pulse Duration
2 1 Strength 100 60
3 2 Stamina 150 30
4 3 Other 120 45

15.1 Summarize the Data


Use the summary() function to summarize the data from a
Data Frame:

Example
Data_Frame <- data.frame (

39
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

Data_Frame summary(Data_Frame)

Output:-
Training Pulse Duration
1 Strength 100 60
2 Stamina 150 30
3 Other 120 45
Training Pulse Duration
Other :1 Min. :100.0 Min. :30.0
Stamina :1 1st Qu.:110.0 1st Qu.:37.5
Strength:1 Median :120.0 Median :45.0
Mean :123.3 Mean :45.0
3rd Qu.:135.0 3rd Qu.:52.5
Max. :150.0 Max. :60.0

You will learn more about the summary() function in the statistical part of the R
tutorial.

15.2 Access Items


We can use single brackets [ ], double brackets [[ ]] or $
to access columns from a data frame:

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

Data_Frame[1]

40
Data_Frame[["Training"]]

Data_Frame$Training

Output:-
Training
1 Strength
2 Stamina
3 Other
[1] Strength Stamina Other
Levels: Other Stamina Strength
[1] Strength Stamina Other
Levels: Other Stamina Strength

15.3 Add Rows


Use the rbind() function to add new rows in a Data Frame:

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# Add a new row


New_row_DF <- rbind(Data_Frame, c("Strength", 110, 110))

# Print the new row


New_row_DF

Output:-

41
Training Pulse Duration
1 Strength 100 60
2 Stamina 150 30
3 Other 120 45
4 Strength 110 110

15.4 Add Columns


Use the cbind() function to add new columns in a Data
Frame:

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# Add a new column


New_col_DF <- cbind(Data_Frame, Steps
= c(1000, 6000, 2000))

# Print the new column

New_col_DF

Output:-
Training Pulse Duration Steps
1 Strength 100 60 1000
2 Stamina 150 30 6000
3 Other 120 45 2000

42
15.5 Remove Rows and Columns
Use the c() function to remove rows and columns in a Data
Frame:

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# Remove the first row and column


Data_Frame_New <- Data_Frame[-c(1), -c(1)]

# Print the new data frame


Data_Frame_New

output:-
Pulse Duration
2 150 30
3 120 45

15.6 Amount of Rows and Columns


Use the dim() function to find the amount of rows and
columns in a Data Frame:

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)

43
)
dim(Data_Frame)

Output:-
[1] 3 3

You can also use the ncol() function to find the number of
columns and nrow() to find the number of rows:

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
) ncol(Data_Frame)
nrow(Data_Frame)

Output:-
[1] 3
[1] 3

15.7 Data Frame Length


Use the length() function to find the number of columns in
a Data Frame (similar to ncol()):

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),

44
Duration = c(60, 30, 45)
)
length(Data_Frame)

Output:-
[1] 3

15.8 Combining Data Frames


Use the rbind() function to combine two or more data
frames in R vertically:

Example
Data_Frame1 <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

Data_Frame2 <- data.frame (


Training = c("Stamina", "Stamina", "Strength"),
Pulse = c(140, 150, 160),
Duration = c(30, 30, 20)
)
New_Data_Frame <- rbind(Data_Frame1, Data_Frame2)
New_Data_Frame

Output:-
Training Pulse Duration
1 Strength 100 60
2 Stamina 150 30
3 Other 120 45
4 Stamina 140 30

45
5 Stamina 150 30
6 Strength 160 20

And use the cbind() function to combine two or more data


frames in R horizontally:

Example
Data_Frame3 <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

Data_Frame4 <- data.frame (


Steps = c(3000, 6000, 2000),
Calories = c(300, 400, 300)
)
New_Data_Frame1 <- cbind(Data_Frame3, Data_Frame4)
New_Data_Frame1

Output:-
Training Pulse Duration Steps Calories
1 Strength 100 60 3000 300
2 Stamina 150 30 6000 400
3 Other 120 45 2000 300

46
16. R Factors
Factors are used to categorize data. Examples of factors are:

• Demography: Male/Female

• Music: Rock, Pop, Classic, Jazz

• Training: Strength, Stamina

To create a factor, use the factor() function and add a vector as


argument:

Example
# Create a factor music_genre <- factor(c("Jazz", "Rock", "Classic",

"Classic", "Pop", "Jazz", "Rock", "Jazz"))

# Print the factor

music_genre
[1] Jazz Rock Classic Classic Pop Jazz Rock Jazz
Levels: Classic Jazz Pop Rock

You can see from the example above that that the factor has four levels
(categories): Classic, Jazz, Pop and Rock.

To only print the levels, use the levels() function:

Example
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz",
"Rock", "Jazz"))
levels(music_genre)

47
[1] "Classic" "Jazz" "Pop" "Rock"

You can also set the levels, by adding the levels argument inside the factor()
function:

Example
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz",

"Rock", "Jazz"), levels = c("Classic", "Jazz", "Pop", "Rock", "Other"))

levels(music_genre)

[1] "Classic" "Jazz" "Pop" "Rock" "Other"

16.1 Factor Length


Use the length() function to find out how many items there are in the
factor:

Example
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz",
"Rock", "Jazz"))

length(music_genre)

[1] 8

16.2 Access Factors


To access the items in a factor, refer to the index number, using []
brackets:

48
Example
Access the third item:

music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz",


"Rock", "Jazz"))

music_genre[3]

[1]Classic

Levels: Classic Jazz Pop Rock

16.3 Change Item Value


To change the value of a specific item, refer to the index number:

Example
Change the value of the third item:

music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz",


"Rock", "Jazz"))

music_genre[3] <- "Pop"

music_genre[3]

[1]Pop

Levels: Classic Jazz Pop Rock

Note that you cannot change the value of a specific item if it is not already
specified in the factor. The following example will produce an error:

49
Example
Trying to change the value of the third item ("Classic") to an item that does not
exist/not predefined ("Opera"):

music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz",


"Rock", "Jazz"))

music_genre[3] <- "Opera"

music_genre[3]

Warning message:
In `[<-.factor`(`*tmp*`, 3, value = "Opera") : invalid
factor level, NA generated

However, if you have already specified it inside the levels argument, it will
work:

Example
Change the value of the third item:

music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz",

"Rock", "Jazz"), levels = c("Classic", "Jazz", "Pop", "Rock", "Opera"))

music_genre[3] <- "Opera"

music_genre[3]
[1] Opera
Levels: Classic Jazz Pop Rock Opera

50

You might also like