Machine
Learning
S. Sridhar and M. Vijayalakshmi
© Oxford University Press 2021. All rights reserved
Chapter 2
Understanding of Data
© Oxford University Press 2021. All rights reserved
What is Data?
• DATA ARE FACTS
• FACTS ARE IN THE FORM OF NUMBERS, AUDIO, VIDEO, IMAGE
• NEED TO ANALYZE DATA FOR TAKING DECISIONS
© Oxford University Press 2021. All rights reserved
Characteristics of Big Data
© Oxford University Press 2021. All rights reserved
Characteristic of Data
© Oxford University Press 2021. All rights reserved
Data Sources
A DATA SOURCE CAN BE ANYTHING –
• STRUCTURED DATA
• SEMI-STRUCTURED DATA
• UNSTRUCTURED DATA
© Oxford University Press 2021. All rights reserved
Structured Data
A STRUCTURED DATA CAN BE ANY ONE OF THE FOLLOWING –
• RECORD DATA
• GRAPHICS DATA
• DATA MATRIX
• ORDERED DATA – SEQUENCE DATA, TIME SERIES DATA, TEMPORAL DATA
© Oxford University Press 2021. All rights reserved
Unstructured Data
AN UNSTRUCTURED DATA CAN BE ANY ONE OF THE FOLLOWING –
• VIDEO, IMAGE, PROGRAMS
• BLOG DATA
• 80% OF ORGANIZATION DATA
© Oxford University Press 2021. All rights reserved
SEMI-Structured Data
A SEMI-STRUCTURED DATA CAN BE ANY ONE OF THE FOLLOWING –
• XML/JSON OBJECTS
• RSS FEEDS
• HIERARCHICAL RECORDS
© Oxford University Press 2021. All rights reserved
Data Storage
© Oxford University Press 2021. All rights reserved
Data Storage
• DATABASE SYSTEMS
• TYPES ARE
1. TRANSACTIONAL DATABASE
2. TIME SERIES DATABASE
3. TEMPORAL DATABASE
© Oxford University Press 2021. All rights reserved
Data Storage
• OTHER TYPES
© Oxford University Press 2021. All rights reserved
Descriptive Analytics
© Oxford University Press 2021. All rights reserved
Diagnostic Analytics
© Oxford University Press 2021. All rights reserved
Predictive Analytics
© Oxford University Press 2021. All rights reserved
Prescriptive Analytics
© Oxford University Press 2021. All rights reserved
Data Analysis Framework
• FRAMEWORK
© Oxford University Press 2021. All rights reserved
Types of Processing
• CLOUD COMPUTING
• GRID COMPUTING
• H-COMPUTING
© Oxford University Press 2021. All rights reserved
Good Data Characteristics
• GOD DATA SHOULD HAVE THESE CHARACTERISTICS
© Oxford University Press 2021. All rights reserved
Open-Source Data
1. DIGITAL LIBRARIES
2. EXPERIMENTAL DATA LIKE GENOMIC AND BIOLOGICAL DATA
3. HEALTHCARE SYSTEMS LIKE PATIENT INSURANCE DATA
© Oxford University Press 2021. All rights reserved
Social-Media Data
1. TWITTER DATA
2. FACEBOOK DATA
3. YOUTUBE VIDEOS
4. INSTAGRAM DATA
© Oxford University Press 2021. All rights reserved
Multimodal Data
• IMAGE ARCHIVES WITH TEXT AND NUMERIC DATA
• WWW
© Oxford University Press 2021. All rights reserved
Data Preprocessing
DATA THAT CAN CAUSE PROBLEMS
• INCOMPLETE DATA
• OUTLIER DATA
• INCONSISTENT DATA
• INACCURATE DATA
• MISSING VALUES
• DUPLICATE DATA
© Oxford University Press 2021. All rights reserved
Missing Data
© Oxford University Press 2021. All rights reserved
Noisy Data
BINNING TECHNIQUE
© Oxford University Press 2021. All rights reserved
Data Normalization
MIN-MAX PROCEDURE
TRANSFORMS DATA TO THE RANGE 0-1
© Oxford University Press 2021. All rights reserved
Data Normalization
Z-SCORE
© Oxford University Press 2021. All rights reserved
Types of Data
© Oxford University Press 2021. All rights reserved
Nominal Data
© Oxford University Press 2021. All rights reserved
Ordinal Data
© Oxford University Press 2021. All rights reserved
Numerical Data
© Oxford University Press 2021. All rights reserved
Types of Data
BASED ON VARIABLES
© Oxford University Press 2021. All rights reserved
Data Visualization
© Oxford University Press 2021. All rights reserved
Data Visualization
© Oxford University Press 2021. All rights reserved
Data Visualization
© Oxford University Press 2021. All rights reserved
Data Visualization
© Oxford University Press 2021. All rights reserved
Data Visualization
© Oxford University Press 2021. All rights reserved
Central Tendency
MEAN OF DATA
© Oxford University Press 2021. All rights reserved
Central Tendency
MEDIAN OF DATA
© Oxford University Press 2021. All rights reserved
Central Tendency
MODE OF DATA
© Oxford University Press 2021. All rights reserved
DISPERSION
RANGE AND STANDARD DEVIATION
© Oxford University Press 2021. All rights reserved
DISPERSION
QUARTILES AND IQR
© Oxford University Press 2021. All rights reserved
Five-point summary
5-POINT SUMMARY
© Oxford University Press 2021. All rights reserved
Shape of Data
SKEWNESS AND KURTOSIS
© Oxford University Press 2021. All rights reserved
Shape of Data
KURTOSIS
© Oxford University Press 2021. All rights reserved
Shape of Data
MEAN ABSOLUTE DEVIATION AND COEFFICIENT OF VARIATION
© Oxford University Press 2021. All rights reserved
Stem-Leaf Plot
© Oxford University Press 2021. All rights reserved
Q-Q Plot
QQ PLOT IS NORMALITY TEST. IF DATA CLOSER TO STRAIGHT LINE, THEN THE
DISTRIBUTION IS NORMAL.
© Oxford University Press 2021. All rights reserved
Bivariate Data
INVOLVES TWO VARIABLES
© Oxford University Press 2021. All rights reserved
Bivariate Data Visualization
© Oxford University Press 2021. All rights reserved
Bivariate Data – Covariance
© Oxford University Press 2021. All rights reserved
Bivariate Data – Correlation
© Oxford University Press 2021. All rights reserved
Bivariate Data – Correlation
© Oxford University Press 2021. All rights reserved
Multivariate Data Visualization
© Oxford University Press 2021. All rights reserved
Multivariate Data Visualization
© Oxford University Press 2021. All rights reserved
Multivariate Essential Mathematics
1. GAUSSIAN ELIMINATION
© Oxford University Press 2021. All rights reserved
Multivariate Essential Mathematics
1. GAUSSIAN ELIMINATION
© Oxford University Press 2021. All rights reserved
Multivariate Essential Mathematics
1. MATRIX DECOMPOSITION
© Oxford University Press 2021. All rights reserved
Multivariate Essential Mathematics
1. MATRIX DECOMPOSITION
© Oxford University Press 2021. All rights reserved
Multivariate Essential Mathematics
1. DISTRIBUTIONS
© Oxford University Press 2021. All rights reserved
Multivariate Essential Mathematics
EXPONENTIAL DISTRIBUTION
© Oxford University Press 2021. All rights reserved
Multivariate Essential Mathematics
BINOMIAL DISTRIBUTION
© Oxford University Press 2021. All rights reserved
Multivariate Essential Mathematics
POSSON AND BERNOULLI DISTRIBUTION
© Oxford University Press 2021. All rights reserved
Density Estimation
© Oxford University Press 2021. All rights reserved
Hypothesis Testing
Z-TEST
© Oxford University Press 2021. All rights reserved
Hypothesis Testing
PAIRED T-TEST
© Oxford University Press 2021. All rights reserved
Hypothesis Testing
PAIRED T-TEST
© Oxford University Press 2021. All rights reserved
Hypothesis Testing
PAIRED T-TEST
© Oxford University Press 2021. All rights reserved
Hypothesis Testing
CHI-SQUARE TEST
© Oxford University Press 2021. All rights reserved
Hypothesis Testing
CHI-SQUARE TEST
© Oxford University Press 2021. All rights reserved
Hypothesis Testing
CHI-SQUARE TEST
© Oxford University Press 2021. All rights reserved
Feature Engineering
© Oxford University Press 2021. All rights reserved
Feature Engineering
• FEATURE TRANSFORMATION
• FEATURE SELECTIONS
© Oxford University Press 2021. All rights reserved
Characteristics of Good Features
• FEATURES ARE REMOVED USING RELEVANCY
• FEATURES ARE REMOVED BASED ON REDUNDANCY
© Oxford University Press 2021. All rights reserved
FEATURE SELECTION
FORWARD SELECTION
© Oxford University Press 2021. All rights reserved
FEATURE SELECTION
BACKWARD SELECTION
© Oxford University Press 2021. All rights reserved
Principal Component Analysis
© Oxford University Press 2021. All rights reserved
Principal Component Analysis
Compute
Covariance
matrix as
Compute Eigen
values and Eigen
vectors and
matrix A as a set
of eigen vectors
© Oxford University Press 2021. All rights reserved
Principal Component Analysis
Compute PCA as
The original
Data can be
recovered as
© Oxford University Press 2021. All rights reserved
PCA Algorithm
© Oxford University Press 2021. All rights reserved
PCA Example
© Oxford University Press 2021. All rights reserved
PCA Example
© Oxford University Press 2021. All rights reserved
PCA Example
© Oxford University Press 2021. All rights reserved
PCA Example
© Oxford University Press 2021. All rights reserved
PCA Example
© Oxford University Press 2021. All rights reserved
Verification
© Oxford University Press 2021. All rights reserved
LDA Algorithm
© Oxford University Press 2021. All rights reserved
LDA Algorithm
© Oxford University Press 2021. All rights reserved
SVD Algorithm
© Oxford University Press 2021. All rights reserved
SVD Algorithm
© Oxford University Press 2021. All rights reserved
SVD Example
© Oxford University Press 2021. All rights reserved
SVD Example
© Oxford University Press 2021. All rights reserved
SVD Example
© Oxford University Press 2021. All rights reserved
SVD Example
© Oxford University Press 2021. All rights reserved
Summary
© Oxford University Press 2021. All rights reserved
Summary
© Oxford University Press 2021. All rights reserved