DATA TYPES
Generally, while doing programming in any programming language, you
need to use various variables to store various information. Variables are
nothing but reserved memory locations to store values. This means that,
when you create a variable you reserve some space in memory.
You may like to store information of various data types like character,
wide character, integer, floating point, double floating point, Boolean etc.
Based on the data type of a variable, the operating system allocates
memory and decides what can be stored in the reserved memory.
In contrast to other programming languages like C and java in R, the
variables are not declared as some data type. The variables are assigned
with R-Objects and the data type of the R-object becomes the data type
of the variable. There are many types of R-objects. The frequently used
ones are −
Vectors
Lists
Matrices
Arrays
Factors
Data Frames
The simplest of these objects is the vector object and there are six data
types of these atomic vectors, also termed as six classes of vectors. The
other R-Objects are built upon the atomic vectors.
Data Type Example Verify
Logical TRUE, FALSE
v <- TRUE
print(class(v))
it produces the
following result −
[1] "logical"
Numeric 12.3, 5, 999
v <- 23.5
print(class(v))
it produces the
following result −
[1] "numeric"
Integer 2L, 34L, 0L
v <- 2L
print(class(v))
it produces the
following result −
[1] "integer"
Complex 3 + 2i
v <- 2+5i
print(class(v))
it produces the
following result −
[1] "complex"
Character 'a' , '"good", "TRUE",
'23.4' v <- "TRUE"
print(class(v))
it produces the
following result −
[1] "character"
Raw "Hello" is stored as 48
65 6c 6c 6f v <-
charToRaw("Hello")
print(class(v))
it produces the
following result −
[1] "raw"
Logical TRUE, FALSE
v <- TRUE
print(class(v))
it produces the
following result −
[1] "logical"
Vectors
When you want to create vector with more than one element, you
should use c() function which means to combine the elements into a
vector.
# Create a vector.
apple <- c('red','green',"yellow")
print(apple)
# Get the class of the vector.
print(class(apple))
When we execute the above code, it produces the following result −
[1] "red" "green" "yellow"
[1] "character"
Lists
A list is an R-object which can contain many different types of elements inside
it like vectors, functions and even another list inside it.
# Create a list.
list1 <- list(c(2,5,3),21.3,sin)
# Print the list.
print(list1)
When we execute the above code, it produces the following result −
[[1]]
[1] 2 5 3
[[2]]
[1] 21.3
[[3]]
function (x) .Primitive("sin")
Matrices
A matrix is a two-dimensional rectangular data set. It can be created
using a vector input to the matrix function.
# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow
= TRUE)
print(M)
When we execute the above code, it produces the following result −
[,1] [,2] [,3]
[1,] "a" "a" "b"
[2,] "c" "b" "a"
Arrays
While matrices are confined to two dimensions, arrays can be of any
number of dimensions. The array function takes a dim attribute which
creates the required number of dimension. In the below example we
create an array with two elements which are 3x3 matrices each.
# Create an array.
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)
When we execute the above code, it produces the following result −
, , 1
[,1] [,2] [,3]
[1,] "green" "yellow" "green"
[2,] "yellow" "green" "yellow"
[3,] "green" "yellow" "green"
, , 2
[,1] [,2] [,3]
[1,] "yellow" "green" "yellow"
[2,] "green" "yellow" "green"
[3,] "yellow" "green" "yellow"
Factors
Factors are the r-objects which are created using a vector. It stores the
vector along with the distinct values of the elements in the vector as
labels. The labels are always character irrespective of whether it is
numeric or character or Boolean etc. in the input vector. They are useful
in statistical modelling.
Factors are created using the factor() function. The nlevels functions gives
the count of levels.
Create a vector.
apple_colors <-
c('green','green','yellow','red','red','red','green')
# Create a factor object.
factor_apple <- factor(apple_colors)
# Print the factor.
print(factor_apple)
print(nlevels(factor_apple))
When we execute the above code, it produces the following result −
[1] green green yellow red red red green
Levels: green red yellow
[1] 3
Data Frames
Data frames are tabular data objects. Unlike a matrix in data frame each
column can contain different modes of data. The first column can be
numeric while the second column can be character and third column can
be logical. It is a list of vectors of equal length.
Data Frames are created using the data.frame() function.
# Create the data frame.
BMI <- data.frame(
gender = c("Male", "Male","Female"),
height = c(152, 171.5, 165),
weight = c(81,93, 78),
Age = c(42,38,26)
)
print(BMI)
When we execute the above code, it produces the following result −
gender height weight Age
1 Male 152.0 81 42
2 Male 171.5 93 38
3 Female 165.0 78 26
Descriptive Analysis
In Descriptive analysis, we are describing our data with the help of various
representative methods like using charts, graphs, tables, excel files, etc.
In the descriptive analysis, we describe our data in some manner and
present it in a meaningful way so that it can be easily understood. Most of
the time it is performed on small data sets and this analysis helps us a lot
to predict some future trends based on the current findings. Some
measures that are used to describe a data set are measures of central
tendency and measures of variability or dispersion.
Measure of central tendency
It represent the whole set of data by single value. It gives us the location of
central points. There are three main measure of central tendency.
Mean
Media
n
Mode
Measure of variability
Measure of variability is known as the spread of data or how well is our
data is distributed. The most common variability measures are:
Range
Varianc
e
Standard Variation
Analysis R Function
Mean mean()
Median median()
Analysis R Function
Mode mfv() [modeest]
Range of values (minimum
and maximum) range()
Minimum min()
Maximum maximum()
Variance var()
Standrad Deviation sd()
Sample quantiles quantile()
Interquartile range IQR()
Generic function summary()
stat.desc() function
The function stat.desc() [in pastecs package], provides other useful statistics including:
the median
the mean
the standard error on the mean (SE.mean)
the confidence interval of the mean (CI.mean) at the p level (default is 0.95)
the variance (var)
the standard deviation (std.dev)
and the variation coefficient (coef.var) defined as the standard deviation divided by the
mean.
In video lecture , function are discussed with the help of example and implementation