KEMBAR78
Presentation 3 - Data Structures | PDF | Data Type | Boolean Data Type
0% found this document useful (0 votes)
16 views45 pages

Presentation 3 - Data Structures

The document provides an introduction to data analysis using R, focusing on data types, objects, and structures within R programming. Key topics include numeric, character, and logical data types, as well as vectors, factors, lists, and data frames. The presentation outlines how to create, manipulate, and analyze these data structures effectively.

Uploaded by

Nicky Ntongani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views45 pages

Presentation 3 - Data Structures

The document provides an introduction to data analysis using R, focusing on data types, objects, and structures within R programming. Key topics include numeric, character, and logical data types, as well as vectors, factors, lists, and data frames. The presentation outlines how to create, manipulate, and analyze these data structures effectively.

Uploaded by

Nicky Ntongani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

EASTERN AFRICA STATISTICAL TRAINING CENTRE

INTRODUCTION TO DATA ANALYSIS WITH R


BACHELOR DEGREE IN
DATA SCIENCE,
OFFICIAL STATISTICS,
BUSINESS STATISTICS AND ECONOMICS

Edwin Tito Magoti


Consultant: Training, Research and Data Analytics
Eastern Africa Statistical Training Centre
P.O. Box 35103, Dar Es Salaam
(+255) 766151460
(+255) 737825292
edwin.magoti@eastc.ac.tz
edwintitomagoti@gmail.com
Data Structures in R and R Studio
Getting Started with R: Presentation Outline

1 Data types
§ Numeric
§ Character/Strings
§ Logical
§ Complex
2 Objects in R
§ Vectors
§ Matrices
§ Factors
§ Lists 3

§ Data frames
1. Data Types

o There are at least five (05) data types that can be


assigned to Variables in R.
o They are:
• Numeric data type
• Character data type
• Logical data type
• Complex data type
• Raw data type
o For this presentation, we consider the first
three types.
4
1.1. Data Types: Numeric data

• Numeric data are real numbers.


• The function is.numeric( ) is used to determine
whether data is a real number or not.

5
1.2. Data Types: Character Data

• Characters are string Variables


• Are written by enclosing in double quotes “ ”
• Thecommand is.character() is used to determine
whether data is a character or not.

6
1.3. Data Types: Logical Data

o In R programming language, the logical data refers


to data values that take the Boolean statements,
either TRUE or FALSE.
o Also, the abbreviation “NA” which stands for Not
Applicable is treated as a logical operator.
o The command is.logical() is used to determine
whether data is a logical or not.
o It should be noted that R is case-sensitive, thus, only
upper cases should be used when referring to logical
operators.
7
1.4. Data Types: Logical Data

❖In R programing language, the logical data refersto


data values which take the Boolean statements,
either TRUE or FALSE.
❖Also, the abbreviation “NA” which stands for Not
Applicable isalso treated as logical operator.
❖The command is.logical( ) is used to determine
whether data is a logical or not.
❖It should be noted that R is Case
caseSensitive:
sensitive,Thethus, onlyis
word True
upper cases should be used not
when referring
a reserved word forlogical
logical
operators. operator

8
1.5. Data Types: Data Coercion

• Data Coercion is the art of changing a data


type of an object.
• R provides room for changing data types.
o If such a need arises, we use the function
as.datatype().
• Note that, if you need to store a new data
type in R while keeping the previous one, a
new object must be created.
o That is, a new assignment must be done.
o Changing the data type by using the same
object name will replace the existing
information.

9
1.5 Data Types: Data Coercion

❖When importing data in R, it may take a default ‘data


type’ format.
❖Using the function is.datatype, helps in knowing the
data type of a particular variable/object.
❖R provides as room for changing data types. If such a
need arises, the function as.datatype is used.
❖Coercion is therefore the art of changing a data type
of an object.
❖Note that, if you need to store a Coercion
new data istype in R
done
while keeping the previous one, a new object must be
for predefined
created (New assignment must be done). Objects

10
2. R Objects

§ R Objects simply means data structures that can


be stored in R
§ R can store various data structures (objects)
including but not limited to:
o Vectors
o Factors
o Matrices
o Lists
o Data frames
ü For this presentation, we take a look at Vectors,
Factors, and Data Frames.
11
2.1. VECTORS

o A vector is a one‐dimensional ordered collection of


data of the same type.
o The data may be numeric, character, logical, complex,
or raw.

o In creating vectors with more than one element a


function c( ) which means to combine the elements
into a vector, or simply concatenate is used.

o A function is.vector(<object>) is used to find out


whether an object is a vector or not 12
2.1. VECTORS: Creating Vectors

o A vector is a one-dimensional ordered collection of data of the


same type.
o The data may be numeric, character, logical, complex or raw.

o In creating vectors with more than one element a function c( )


which means to combine the elements into a vector is used.

o A function is.vector(<object>) is used to find out whether an


object is a vector or not

13
2.1. VECTORS: Length, Class and Structure

o The dimension of a vector is called length.


• This can be found by using the function
length(<vector_object>)
o You can call for the class (data type) of a vector
by using the class function defined as:
class(<vector_objects>).
o The structure of the vector can also be assessed
using the function syntax str(<object>).
§ This is useful in displaying:
• Data type,
• Dimension and 14

• Contents on a vector.
2.1. VECTORS: Length, Class and Structure

v In examining the length, data type, and


structure of a vector, the following commands
may be used

15
2.1. VECTORS: Creating Vectors Using Colon, :

• Alternatively; numeric vectors containing integers can be


created using a colon, : in between the two integers.

16
2.1. VECTORS: Creating Vectors Using seq() function

§ We can also create a vector as a sequence of numbers


using a function seq(from, to, by), or seq(from, by,
length),
§ where:
- from is the value the sequence starts at.
- to is the value it finishes at.
- by is an optional argument that gives the steps the
sequence increases by, its default is ±1 unless the
length option is used.
- length is an optional argument that gives the
required length of the sequence

17
2.1. VECTORS: Creating Vectors Using seq() function

• Illustration on creating Vectors (sequences) in R

18
2.1. VECTORS: Creating Vectors Using rep() function

o On the otherhand, a function rep( ) is used in case


there is a need for repeating a an entry several times.

19
2.1. VECTORS: Naming Vectors

o R provides room for naming the elements in a vector.


o Suppose we create a vector, (say Production), that
contains the production of maize from five maize-
producing regions in Tanzania.

20
2.1. VECTORS: Indexing/extracting Vectors

o When working with vectors, we may be interested


in extracting some of the elements in a vector.
ü Indexing provides a convenient way to
achieve that.
o We use the Square brackets, [ ] to tell R it’s an index
and the number in the square bracket gives the
position of the element.

21
2.1. VECTORS: Indexing Vectors
o Illustration….

22
2.2. FACTORS
§ Factors are the r-objects which are created using a
vector.
§ A Factor is a special vector used for storing
categorical data such as marital status, education
level, occupation, etc.
o A factor stores the vector along with the distinct values of the
elements in the vector as labels.

§ Factors are created using the function


factor(Vector_Object )
§ The different categories are called levels and they are
assigned the values 1, 2, 3, …, n.
§ The function nlevels gives the count of levels.
o The levels are (by default) sorted into alphabetical order23
2.2. FACTORS
o Using the str() function with factor variable
(object) gives information about:
§ Data type
§ Number of levels
§ Labels
§ value labels

24
2.2. FACTORS

v In the previous example, we can see that the


categories F and M have been assigned the values
(levels) 1 and 2.
v We can change the order(levels) using the function

factor(<vector object>, levels=<levels vector>)

25
2.2. FACTORS

v We can insert value labels to the categorical


variables using the function:
factor(<vector object>, levels=<levels vector>,
Labels = <labels vector>)

26
2.2. FACTORS

27
2.2. FACTORS
• R gives room for specifying the order of variables in
the case of ordered categorical variables.
• To achieve this, we set Ordered option as
TRUE
factor(<vector object>, levels=<levels vector>,
labels=<labels vector>, ordered=TRUE)

28
2.2. FACTORS
q Differences between Nominal and Ordered
categorical Variable

v R gives a room for specifying orders of categorical


variables(where necessary) such as rickets scales.
v To achieve this, we set Ordered function as TRUE
factor(<vector object>, levels=<levels vector>,
labels=<labelsvector>)

29
2.3. MATRICES

q Details

30
2.4. LISTS
q Details

31
2.5. DATA FRAME

o A data frame is a list of vectors and/or factors of the


same (equal) length.
o Objects (Variables) in the data frame can be of a
different data type but each column should
contain elements of the same data type.
- For instance:
ü The first column can be numeric,
ü The second column can be character
and
ü The third column can be logical.
o Rows of a data frame represent an observation
32
2.5. DATA FRAME: Creating data frame

v To create a data frame in R, a function data.frame( ) is used:


data.frame(<vector1>, <vector2>,…, <vectorn>)
o The command is.data.frame(<object>) is used to
determine whether an object is a data frame.
o The dimensions command dim(<object>)is used to
explore the number of rows and columns
o The numberof rows can be found usingthe
command nrow(<dataframe>).
o The number of columns can be found using the
command ncol(<dataframe>).

33
2.5. DATA FRAME: Creating a data frame

34
2.5. DATA FRAME: Creating a data frame

35
2.5. DATA FRAME: Creating a data frame
• After creating the data frame, the following syntaxies can
be used to explore the data frame

36
2.5. DATA FRAME: Combining Data Frames
• Data frames can be combined by binding them
together.
• We can bind data frames:
v Column-wise (column bind)
- If the aim is to add more variables
using the function cbind( ).
v Row-wise (row bind)
- If the aim is to add observations using
the function rbind( ).
§ Binding vectors form a matrix.
§ Binding a data frame with another object (such as a factor
or vector) creates a data frame
§ Biding data frames creates a data frame 37
2.5. DATA FRAME: Combining Data Frames - Column

Added Variables/Objects
38
2.5. DATA FRAME: Combining Data Frames - Row

• Row bind essentially adds more observations to


the existing data frame.
• Materializes when we have data of same variables
in different data frames (conceptually, different
files)

39
2.5. DATA FRAME: Combining Data Frames - Row

40
2.5. DATA FRAME: Indexing/sub-setting Data frame

• As noted earlier on, indexing or sub-setting


is the process of selecting some of the
elements of an object.
• Depending on the purpose, this can be
achieved by:
ü Giving the position of the element in
square brackets after the name of the
object
§ row/columns must be specified
ü Using the dollar ($) sign.
41
2.5. DATA FRAME: Indexing/sub-setting Data frame

42
2.5. DATA FRAME: Indexing/sub-setting Data frame

ü The $ is useful in indexing/extracting


variables.

43
2.5. DATA FRAME: Indexing/sub-setting Data frame

ü The $ is useful in indexing/extracting


variables.

44
End of Presentation II

?
Next Lesson:
Importing and Exporting data,
Combining Dataset 45

You might also like