Introduction to Data Analytics
Course Introduction
What Is Data Analytics?
Data analytics is the science of extracting trends, patterns, and relevant information
from raw data to draw conclusions.
Data Analytics: Benefits
Cost Reduction Improved efficiency
DATA
ANALYTICS
Good resource
Effective decision
utilization
making
Good market
insights
Data Analytics: Domains
Education Healthcare
Industry Industry
Government
Media and Organizations
Entertainment
Industry
Retail
E-Commerce Industry
Industry
Data Analytics Domains
Course Outline
Course
Introduction Data Analytics,
Data Science, and
Machine Learning
5
Evolution
2
of Data
Data
Analytics
4 Visualization
for Decision Data Science
Making 6
Methodology
Dealing with Different
Types of Data Data Analytics in
7
Different Sectors Analytics Framework,
Case Study, and
Upcoming Trends
8
Learning Outcomes
By the end of this course, you will be able to:
● Analyze the triggers that led to the evolution of
analytics
● Develop an analytical approach to a business problem
● Compare data science, data analytics, and machine
learning and understand their business application
● Explain the significance of data visualization in
analytical modeling to drive meaningful business
decisions
● Identify business use cases that can leverage data
analytics
Course Features
Number of case studies: 11
Number of research studies: 2
Course Features
Few case studies discussed in this course:
Amazon uses data analytics to improve efficiency and reduce cost.
LinkedIn employs data analytics to revamp its job listings, track
user profiles, and posts.
Netflix gathers data from its subscribers to decide on customer
preferences.
Course Features
Research Studies
According to McKinsey, companies that use customer analytics
outsmart their competitors in terms of profit.
According to a survey conducted by the Business Application
Research Center (BARC) on the BI trends, Master Data and Data
Quality Management are the most important trend in 2020.
Happy Learning
Introduction to Data Analytics
Evolution of Data Analytics
Learning Objectives
By the end of this lesson, you will be able to:
Explain the impact of data analytics on accounting
Explain data analytics and its life cycle
Describe the various stages of data analytics
Outline the benefits of data analytics
Importance of Data Analytics
Data Analytics
● Data analytics is the science of extracting trends,
patterns, and relevant information from raw
data to draw conclusions.
● It has multiple approaches, multiple dimensions,
and diverse techniques.
Why Data Analytics?
Data analytics helps in:
● Scientific decision making and effective business
operations.
● Analyzing data, gaining profits, making better use of
resources, and improving managerial operations.
Problems with Traditional Accounting Methods
Problems with Traditional Accounting Methods
● Accounting was done in the form of
notebooks. This was cumbersome and
tedious.
● Use of excel sheets simplified accounting
but did not solve all problems.
Problems with Traditional Accounting Methods
● SMBs and start-ups face issues in managing and tracking
cash flow.
● A highly accurate and dependable solution is required
for financial management in business.
Problems with Traditional Accounting Methods
● Difficulty in tracking small expenses such as
one-time government tax and regular taxes.
● Requirement of a dedicated financial expert
by SMBs and start-ups.
● Small business owners managing the role of
an HR and payroll expert due to lack of
resources or money.
Problems with Traditional Accounting Methods
● Interpreting and analyzing financial
reports with traditional accounting
methods was difficult.
● The available excel-based macros and
pivot tables did not provide sufficient
insight into the data.
Challenges in Traditional Accounting Methods
● How much inventory must be held?
● How many invoices are overdue?
● How much cash is tied up at work?
● How long does it take to get cash from the customers?
Data Analytics: Impact on Accounting
Impact of Analytics on Accounting
Uncovers valuable insights Identifies process improvements
Helps in managing risks Adds value to the decision-
making process
How Accountants Use Data Analytics
Auditors Tax accountants Investment advisors
• Deploy continuous • Use data science to • Use big data to find
monitoring analyze complex behavioral patterns
taxations
• Analyze and verify large • Identify investment
data sets • Helps in faster opportunities
investment
• Few errors and precise decisions • Generate higher profit
recommendations margins
Data Analytics: Overview and Process Flow
Data Analytics: Definition
Data analytics is the process of examining and analyzing raw data sets to:
● Draw conclusions
● Derive more information
● Improve businesses,
products, and services
In addition to making business decisions, it is used by data scientists and
researchers to verify scientific models and theories.
Data Analytics: Process Flow
1. Define goals
2. Identify measurable metrics
3. List, collect, and extract data from sources
4. Explore and analyze data
5. Interpret and visualize data
6. Infer data for decision-making
Data Analytics Life Cycle
Operationalize Discovery
Deliver final reports, briefs, Learn about business
codes, and technical domain and assess
documents. available resources.
6 1
Communicate Results Data Preparation
Identify key findings, business 5 Data Analytics 2
Execute ELT (extract, load, and
values, and develop narratives transform).
for stakeholders.
4 3
Model Building Model Planning
Develop data sets for testing, Identify techniques and data
training, and production. to understand variables
relationship.
Types of Data Analytics
Types of Data Analytics
The four main types of analytics based on the workflow and requirements of data analytics:
Why did this happen? How can we make it happen?
Descriptive Diagnostic Predictive Prescriptive
Analytics Analytics Analytics Analytics
What happened? What will happen?
Descriptive Analytics
Descriptive Analytics
● Descriptive analytics is
● It is the conventional form
designed to access information
of analytics.
about the past.
● It focuses on the summarized ● Its purpose is to
view of facts. summarize the findings.
Descriptive Analytics
Data Techniques of Data
Aggregation Descriptive Mining
Analytics
• Data aggregation is the process of gathering and expressing
information in a summarized form.
• Tools used for data aggregation include MS Excel, MATLAB, SPSS, and
STATA.
• Company report is an example of descriptive analytics.
Diagnostic Analytics
Diagnostic Analytics
● Diagnostic analytics helps you identify why
something happened in the past.
● It takes a deeper look at data to understand the
root cause of events.
● It has a limited ability to provide actionable
insights.
● It provides an understanding of causal
relationships and sequences.
Diagnostic Analytics Techniques
Drill-down Data Discovery
Data Mining Correlation
Diagnostic Analytics Techniques
● They can be used to discover a causal relationship between two
or more data sets.
● Diagnostic analytics is helpful for those concerned with day-to-
day operations.
● For example, It helps identify why a sales representative has
sold fewer items than usual.
Predictive Analytics
Predictive Analytics
Predictive analytics is used in:
● Predicting future outcomes in terms of probability of
an event to occur
● Analyzing sentiments where all opinions posted on
social media are collected to predict a person’s
sentiment
● Identifying target audience for a promotional
campaign
● Forecasting weather, plan-failure prediction, and
travel products recommender system
A predictive model is built on the preliminary descriptive analytics stage.
Predictive Analytics Tools
Machine learning algorithms such
as random forests, SVM and
statistics.
Trained data scientists and
machine learning experts building
these models
Popular tools for predictive
analytics: Python, R and
RapidMiner.
Prescriptive Analytics
Prescriptive Analytics
Prescriptive analytics provides the It creates and updates the
solution for a prediction in the future. relationship between action and
1 2 outcome using a feedback system.
It is used by recommendation It helps in making optimal
engines in companies. 7 3
recommendations during the
decision-making process.
It is the final frontier of 6 4
It helps in mitigating the possible
advanced analytics. risks based on the available
predictive analytics.
5
It has the power to suggest favorable solutions
and ease the decision-making process.
Prescriptive Analytics
● Predictive analytics is at the budding stage of
implementation and firms have not used its full
potential.
● Advancements in predictive analytics is paving the
way for its development.
Types of Analytics: Example
Types of Analytics: Amazon Example
● Amazon’s revenue increased in the West Coast
Diagnostic
Analytics during the past one year
● Increased spending on sales training
Predictive ● Purchase factors: price, time, weather, and festive
Analytics seasons
● Predicted 10–12 percent increase in revenue
Types of Analytics: Amazon Example
Descriptive
● Spent $20M in different sales training the
Analytics previous year
● Sales trainings fetched good ROI
Prescriptive
Analytics ● Implemented a suitable optimization plan to
maximize revenue
Data Analytics Benefits: Decision-Making
Data Analytics Benefits: Decision-Making
● Companies use business analytics to enable
faster and facts-based decision making.
● Data-driven organizations make better
strategic decisions.
● Companies enjoy high operational efficiency,
improved customer satisfaction, robust profit
and revenue level.
Data Analytics Benefits: Decision-Making
Data analytics helps you define your target
audience based on:
● Customer age group
● Customer preferences
● Location-based purchases
● Popular brands or products people seek
Data Analytics Benefits: Decision-Making
Data analytics in e-commerce helps:
● Manage inventory
● Forecast demands
● Identify shopping seasons
● Analyze customer sentiment
● Decide optimum prices
Data Analytics Benefits: Cost Reduction
Data Analytics Benefits: Cost Reduction
● Data analytics helps understand shopper
behavior by monitoring their browsing
interest.
● Seller identifies shopping pattern and
customer demand.
● Customer data helps companies minimize
failed campaigns and reduce cost
associated with them.
Data Analytics Benefits: Cost Reduction
Data analytics helps in reducing
marketing and logistics cost.
Marketers use technologies to Marketing campaigns use
evaluate customer behavior measured activities to
and make strategic decisions. plan campaigns.
Predictive analytics is used for better
performance, higher ROI, and faster
success.
Use of Predictive Analytics for Logistics Management
Predictive analytics helps companies in logistics management by:
● Analyzing current and historical facts to make predictions
● Procuring products based on purchase history
● Organizing customers from shopping patterns and demographic details
● Planning inventory and offloading excess stock
Factors to Consider in Logistics Planning
Seasons Economic conditions
Weather
Data provides several insights such as identifying products that people tend to buy in a particular season.
Data Analytics Benefits: Case Study
According to a study by IHL group, footwear and clothing worth $642.6 billion are returned
to stores every year.
● Products are returned as consumers miss important information during the purchase.
● Critical information provided through a detailed product specification or product video
can reduce the return rate.
● Data analytics help companies assess the possibility of reducing product return rate.
Case Study: Amazon
Case Study: Amazon
Amazon uses data analytics to improve efficiency and reduce cost.
Predictive analytics helps to:
Predict what you buy Anticipate shipping
Such predictions help increase sales and reduce shipping, inventory, and
supply chain costs.
Case Study: Amazon
● Amazon has more than 200 fulfillment centers
worldwide.
● Supply chain and logistics optimization help
companies reduce cost and improve
performance.
● Amazon uses data analytics for choosing the
warehouse closest to the customer and reduces
shipping costs by 10–40 percent.
Case Study: Amazon
● Amazon uses data analytics to attract customers and
increase profits by an average of 25 percent annually.
● Prices are set based on customer activity on a
website, competitors’ pricing, and product availability.
● Product prices typically change every 10 minutes as
data is updated and analyzed.
● Amazon typically offers discounts on the best-selling
items and earns larger profits on less popular items.
Data Analytics: Other Benefits
Data Visualization Tools
Power BI
Data
Tableau Visualization
Tools
Logi
Visualization allows decision makers to see connections between multidimensional
data.
It provides new ways to interpret data through graphical representations.
Other Benefits of Data Analytics
● Data analytics helps in Identifying potential
opportunities to streamline operations.
● It identifies potential problems and gives time
to take actions.
● It allows companies to identify operations that
yield the best results.
● It identifies and improves error-prone
operational areas.
Other Benefits of Data Analytics
● Organizations implement data analytics in product
or service development.
● Data analytics helps in understanding current state
of business.
● It provides valuable insights to predict future
outcomes.
● It helps businesses align new process or products
with market needs.
● Data analytics tools are capable of handling
heterogeneous data and providing insights.
Key Takeaways
Data analytics is the process of examining and analyzing raw
data sets to derive information and improve business.
Discovery, data preparation, model planning, model building,
communicate results, and operationalize are the six steps of
data analytics life cycle.
The four stages of data analytics are descriptive analytics,
diagnostic analytics, predictive analytics, and prescriptive
analytics.
Introduction to Data Analytics
Dealing with Different Types of Data
Learning Objectives
By the end of this lesson, you will be able to:
List the terminologies used in data analytics
Describe the types of data
Explain the levels of measurement
Terminologies in Data Analytics
Terminologies in Data Analytics
Data Sampling
Observation
Dataset
Prediction
Terminologies in Data Analytics
Observation Data Sampling Data Set Prediction
● Observation is a single row or a record
of data from the database.
● Any data can be assumed as a set of
observations.
Terminologies in Data Analytics
Observation Data Sampling Data Set Prediction
Database Table
Age Height Nationality Gender
Variables
Rows
Observation is the unit of analysis on which the measurements are taken.
It is also known as a case, record, pattern, or row.
Terminologies in Data Analytics
Observation Data Sampling Data Set Prediction
● Data sampling is a statistical analysis
technique used to select, manipulate,
and analyze a representative subset of
data points.
● Data sampling identifies patterns and
trends in the larger data set.
Terminologies in Data Analytics
Observation Data Sampling Data Set Prediction
● If a sample is randomly selected with 1 or n
observations, then n is the sample size.
● The chart explains the sampling process where a few
people are randomly sampled from a group of
population.
● Data sampling is cost effective and surveys only the
representative sample.
● It enables data scientists, predictive modelers, and
data analysts to produce accurate findings.
Terminologies in Data Analytics
Observation Data Sampling Data Set Prediction
● Data set is a collection of data or the total
data captured about a particular use case.
● It can hold information such as medical,
insurance, and loan approval records.
● It is not limited to numbers and texts and
may include collections of images or videos.
Terminologies in Data Analytics
Observation Data Sampling Data Set Prediction
The table represents loan data with attributes such as loan ID, borrower’s gender,
education, employment status, credit history, loan amount, and property details.
Terminologies in Data Analytics
Observation Data Sampling Data Set Prediction
● The goal of prediction is to move from
what has happened to providing the best
assessment of what will happen.
● In the graph, linear prediction technique is
used to predict the number of children
within different education levels.
Types of Data
Types of Data
Structured Data Unstructured Data Semi-Structured Data
It is the data that is processed, It is the type of data that lacks It is the data type containing
stored, and retrieved in a fixed any specific form or structure. both structured and
format. unstructured data.
Example: Email
Example: Employee details, Example: CSV and JSON
job positions, and salaries. documents
Analyzing Unstructured Data
Unstructured information is
About 80% of business data is text-heavy and contains data
unstructured. such as dates, numbers, and
facts.
Internally generated information Unstructured data is primarily
is considered unstructured as the used for BI and analytics but
intelligence doesn’t fit neatly into not for transaction processing
a database. applications.
Analyzing Unstructured Data
Retailers and manufacturers analyze unstructured data to:
● Improve customer relationship management processes
● Enable targeted marketing
● Perform sentiment analysis on product reviews
The line between unstructured and semi-structured data is not clearly defined.
Unstructured data has some level of structure in it.
Qualitative and Quantitative Data
Qualitative and Quantitative Data
Qualitative Data
Data in which classification of objects
is based on attributes and properties.
Example: Softness of skin etc.
Quantitative Data
Data can be measured and expressed
numerically.
Example: Your height and shoe size.
Qualitative and Quantitative Data
Qualitative Data Quantitative Data
● Data collection is unstructured. ● Data collection is structured.
● It asks why. ● It is all about how much or how many.
● It cannot be computed as it is non- ● It is statistical and is about numbers.
statistical.
● It recommends the final course of
● It develops initial understanding and action.
defines the problem.
Subgroups of Qualitative Data
Qualitative
Nominal data Ordinal data
Data
Unordered data to which an order is Ordered data that is assigned to
assigned in relation to other named categories in a ranked fashion
categories
Example: Grade classification like pass or Example: Feedback to a product with 1–5
fail for student's test results. ranking.
Subgroups of Quantitative Data
Discrete data Quantitative Continuous data
Data
It can only take certain values. It can take any value within a
specified range.
Example: The number of students
Example: Share price of a company
in a class
Data Levels of Measurement
Data Levels of Measurement
It is a classification that describes the nature of information within the values assigned to variables.
Ratio
Interval
Ordinal
Nominal
Data Levels of Measurement
Nominal Ordinal Interval Ratio
● In nominal level of measurement, numbers in the variable
are used to classify data.
● At this level, words, letters, and alphanumeric symbols can
be used.
M F
● Example: People in female gender category are classified
as F and those in male gender are category classified as M.
Data Levels of Measurement
Nominal Ordinal Interval Ratio
● Ordinal level of measurement depicts ordered
relationship among the variable’s observations.
● It indicates an order of the measurements.
● Example: A student with 100% score is assigned the
first rank, another student with 95% score would be
assigned the second rank, and so on.
Data Levels of Measurement
Nominal Ordinal Interval Ratio
● The interval level of measurement classifies Temperature in centigrade
and orders the measurements.
● It also specifies that the distances between
each interval on the scale are equivalent.
● Example: Temperature in centigrade where the
distance between 80 degrees and 100 degrees
is same as the distance between 1000 degrees
80°C - 100°C = 1000°C - 1020°C
and 1020 degrees.
Data Levels of Measurement
Nominal Ordinal Interval Ratio
● In the ratio level of measurement, observations can have a value of zero.
● Although properties of ratio measurement are similar to the interval level of measurement, the zero in
scale makes it different from the other levels of measurement.
Note: The nominal level classifies data, while the ordinal level indicates an order of measurements.
The interval level and the ratio level of measurements provide the same level of measurement.
Normal Distribution of Data
Normal Distribution of Data
● Normal distribution is also known as ● It is the most important probability
Gaussian distribution or Bell curve. distribution in statistics.
● It is a perfectly symmetric bell-
● Most of the natural phenomena and
shaped distribution curve with only
occurrences follow Bell curve.
one peak.
● It is denser at the center and has
● It is continuous and have tails that
equal mean, median, and mode
are asymptotic.
values.
Statistical Parameters
Basic Statistical Parameters
Mean Variance
Variance Standard Deviation
● Mean is the average of all data ● Variance is the sum of the squares ● Standard deviation is the square
points for a given set of data. of differences between all root of variance and shows the
numbers and means divided by extent to which data varies from
● It is used to derive the central the number of data points. the mean.
tendency of the data.
● It gives a measure of how the data ● It shows how tightly data points
● It is measured by adding all distributes itself about the mean. are clustered around the mean.
data points and dividing the
sum by the number of data ● It looks at all the data points and ● It is more concrete and gives the
points. then determines their distribution. exact distances from the mean.
Basic Statistical Parameters: Example
Dataset x = {1;2;3;4;5;6}
Mean = (1+2+3+4+5+6)/6 = 3.5
Variance = [(1-3.5)2+(2-3.5) 2+(3-3.5) 2+(4-3.5) 2+(5-3.5) 2+(6-3.5) 2]/6 = 2.917
Standard deviation = √2.917 = 1.708
Key Takeaways
Structured data, unstructured data, and semi-structured data
are the three types of data.
Nominal, ordinal, interval, and ratio are four data levels of
measurement.
Normal distribution of data is the most important probability
distribution in statistics.
Mean, variance and standard deviation are the basic statistical
parameters.
Introduction to Data Analytics
Data Visualization for Decision-Making
Learning Objectives
By the end of this lesson, you will be able to:
Explain data visualization
Describe the importance of data visualization
List various tools of data visualization
Data Visualization
Data Visualization
● Data visualization is the graphical representation of
data using charts, graphs, and maps.
● Our eyes are drawn to colors and patterns.
● Data visualization is a form of visual art that grabs
our interest and keeps our eyes on the message.
Data Visualization
Visualized data is more effective and consumable than a massive
spreadsheet of data.
Data Visualization
● The table shows the total sale of products
corresponding with each year.
● The adjacent graph is the visualization
of the sale data points using a
visualization tool.
Understanding Data Visualization
Understanding Data Visualization
Data analytics allows decision makers and executives to weigh the alternatives of different outcomes of
their decisions.
Helps decision makers strategize Provides answers to key
the best business outcomes Data Visualization business questions
Provides simplicity, clarity, intuitiveness, insightfulness, pattern,
and trending capability to help executives take decisions
Benefits of Data Visualization
● Sales reports are formal documents or
PowerPoint slides with many tables and charts.
● They are elaborate and the real point is lost in
the data.
● Data visualization helps by making information
crisp, clear, and memorable.
Benefits of Data Visualization: Example
● From a bar chart, a sales director can
identify that the sales of their flagship
product in the southwest region is going
down by eight percent.
● The director can spot the occurrence of
variances and start formulating a plan to
improve the sale.
● Data visualization allows executives to
spot problems and act on them.
Data Visualization: Uber
In October 2018, Uber released a visualization product that provides insights on mobility for JUMP
bikes.
This data is helpful in urban areas to evaluate the success of their shared bike program.
It can also help in planning infrastructure investments in cities to promote safety and smooth mobility.
Commonly Used Visualizations
Commonly Used Visualizations
Frequency
Heat map Distribution Swarm Plot
Plot
Heat map
Heat map
● A heat map is a type of graph that uses a warm-
to-cool color spectrum to visualize the data.
● It measures the relationship between multiple
variables and shows the strength of
relationships with colors.
● It helps in creating a visually impactful view of
correlation.
Heat map: Use Case
1 10 11 12 12 14 15 100
● All the rows are one category and all the 16 17 18 19 20 21
columns are another category. 2
● Individual rows and columns are divided 3 50
into subcategories.
4
● Cells either contain color-coded
categorical data or numerical data. 5 0
Row A B C D E F
name
Column name
Heat map: Use Case
1 10 11 12 12 14 15 100
● Data in a cell is based on the relationship
16 17 18 19 20 21
between two variables in the connecting 2
row and column.
3 50
● Multiple value ranges can be
represented by a selection of solid 4
colors, while a single range can be shown
by a gradient scale. 0
5
Row A B C D E F
name
Column name
Heat map: Case Study
This heat map shows the sales data across
months and year.
● It is observed that during 1949–1950,
sales were in range of 0–200.
● Sales crossed the 200 mark in 1951 and
increased every year.
● In 1956, sales were between 300 to 500
and the maximum sale was in July.
● Over the years, July had the maximum
sales and it peaked in 1960.
Frequency Distribution Plot
Frequency Distribution Plot
Frequency distribution plot measures the frequency of occurrence for a given value or range.
A normalized frequency distribution normalizes total frequency to one.
The frequency of an event is the number of times the event occurs in an observation.
Observations within a given interval are in graphical or tabular format.
Frequency Distribution Plot
Frequency distributions can be displayed in these formats:
Table Histogram Line graph
Dot plot Pie chart
Analysts use the frequency distribution plot to check or illustrate the data collected in a
sample.
Frequency Distribution Plot: Use Case
This conditional frequency distribution graph shows the usage of two specific words in public
speeches over a period of time.
The graph indicates that the use of America has increased, while the use of citizen has reduced
gradually in speeches.
Swarm Plot
Swarm Plot
● A swarm plot gives a good representation of
the distributions but works well only for small
data sets.
● It is useful to examine individuals, places, or
things in your data.
● It allows you to plot all of your points in a
single space.
● It is a one-dimensional scatter plot as it plots
the data on a single axis and then offsets in
the other direction to show volume.
Swarm Plot
A swarm plot enables you to separate all overlapping points, making each point visible.
Beeswarm plot
It is also called a beeswarm plot as the graphical representation is similar to a group of bees.
Swarm Plot: Use Case
In this swarm plot:
● The X-axis denotes time and the Y-axis denotes
the tip amount
● The blue graph represents lunch and the
orange graph represents dinner
● Tips are higher during dinner and most tips are
$2 and $3
● The maximum tip is $10, which was given for
dinner.
Importance of Data Visualization
Importance of Data Visualization in Analytics
● Data visualization tools provide access to
trends, outliers, and patterns in data.
● They help organize and present important
findings from the data.
● Data in user-friendly charts help businesses
gain insights to make right decisions.
Importance of Data Visualization in Analytics
● Data analytics tool allows a user to present
massive data intuitively.
● Decision makers see patterns, trends, and
correlations in the data being analyzed.
● It help decision makers in cutting costs or
improving operational processes.
Exploratory Data Analytics
Exploratory Data Analytics
● Exploratory data analytics is an approach to
analyze data sets to summarize their main
characteristics.
● Data visualization in exploratory data analytics
is the first step towards modeling.
● EDA primarily helps analyze data beyond the
formal modeling.
Exploratory Data Analytics
The steps involved in EDA are:
● Get detailed insights into the dataset
● Understand critical impact variables that
influence the dataset
● Detect if any outliers are present in the
dataset
● Test the underlying assumptions of the
dataset
Data Visualization Tools
Data Visualization Tools
● Tableau is the most widely used data
visualization tool due to its simplicity and
ability to produce interactive visualizations.
● It has a large customer base of more than
50,000 accounts across many industries.
● FusionCharts is a JavaScript-based visualization
package that can produce and integrate 90
different chart types.
● It has a range of live example templates so that
you can simply plug in your data sources as
required.
Data Visualization Tools
● Highcharts is often chosen when a fast and
flexible solution has to be rolled out.
● Its cross-browser support feature helps users
view and run interactive visualizations.
● Datawrapper has a simple interface that makes
it easy to upload CSV data, create charts, and
maps.
● It is becoming popular among media
organizations to create charts and present
statistics.
Data Visualization Tools
● Plotly enables complex and sophisticated
visualizations.
● It is integrated with analytics-oriented
programming languages such as Python, R, and
Matlab.
● Sisense provides a full stack analytics platform
and simple-to-use drag and drop interface.
● It helps in creating charts and complex graphics
with minimum hassle and provides a repository
for gathering multiple sources of data.
Other Visualization Tools
Other Visualization Tools
● Power BI is a powerful suite of business analytics
tools and has intuitive UI for users familiar with
Microsoft products.
● It can create customized, user-defined
visualizations as well as sophisticated 3D maps.
Power BI
● The Looker BI tool provides extensive
visualization abilities, along with real-time
analysis.
● Users can either use templates from the Looker
library or create a custom visualization.
Other Visualization Tools
● Domo is self-service business intelligence
that focuses on social collaboration.
● It provides real-time data and uses creative
data displays such as multi-part widgets
and sparklines.
● Board is a full-featured business
intelligence system.
● It serves midsize and enterprise-level
companies in different industry segments.
Other Visualization Tools
● Qlik Sense has a clean and clutter-free user
interface and a highly customizable setup.
● Qlik Sense is Tableau’s biggest competitor.
● It has over 40,000 customer accounts in more
than 100 countries.
Languages and Libraries for Data Visualization
Languages for Data Visualization
Few languages and libraries leveraged by
data visualization:
● Scala
● R
● Python
● Javascript
● Java
Languages for Data Visualization
Python has two exclusive libraries for
data visualizations that are
Matplotlib and Seaborn.
Scala is a compiled language and
the code written in Scala gets
executed much faster.
Languages for Data Visualization
Base graphics, lattice graphics, grid
graphics, and ggplot2 are the four
graphic systems supported by R.
Inbuilt libraries available in Java such
as Java 2D, Java 3D, and Java
advanced imaging makes data
visualization simple with Java.
Java
Java 2D Java 3D
Advanced
Data Visualization Libraries
Shiny is an R Matplotlib is the Seaborn is a Python Bokeh is native of
package and it is first data data visualization Python and helps to
easy to build visualization library library and provides create interactive,
interactive web and has 2D and 3D an interface for web-ready plots by
apps straight from graphics support. drawing statistical supporting
R. graphics. streaming and real-
time data.
Dashboard-Based Visualization
Dashboard-Based Visualization
Provides a visual real-time representation Displays data in the form of tables, line
of a company’s data. charts, bar charts, and gauges.
Business
Dashboard
Monitors business health by visually
Helps businesses generate business
tracking, analyzing, and displaying key
insights.
data points.
Dashboard-Based Visualization
This dashboard has parameters such as daily target, sales pattern, and
other business insights from different charts.
Characteristics of effective dashboards:
● Highly interactive
● Customizable interface
● Pulls real-time data from multiple
sources
Steps for Dashboard-Based Visualization
● Analyze your target audience
● Identify Key business parameters
● Identify the end goal of the dashboard
● Get hands-on in developing the dashboard
● Continuous process of improvement
Steps for Dashboard-Based Visualization
Analyze your target
audience Know who will use the data to make decisions.
Identify key business
Identify KRAs and KPIs for each Key Process Area (KPA) and
parameters
Service Level Agreement (SLA) parameters.
Identify end goal of the
Define a target dashboard outline.
dashboard
Get hands-on in Help data scientists develop the dashboard by selecting
developing dashboard convenient languages and libraries.
Continuous process of Improve the dashboard based on real-time inputs and customer
improvement feedback.
BI and Visualization Trends
BI and Visualization Trends
● The development of BI to analyze and
extract value from various sources
introduced many errors and low-
quality reports.
● Companies choose to implement the
Data Quality Management (DQM)
policy as it is a key factor to efficient
data analytics.
BARC Research Study
A survey was conducted by the Business
Application Research Center (BARC) on the BI
trends.
Users
Users, consultants, and vendors were among
the 2865 participants for the survey.
2865
Master data/Data quality management was Consultants
participants
stated as the most important trend in 2020.
Vendors
BARC Research Study
● BI practitioners identified Master
data/DQ management, Data discovery,
and Data-driven culture as the three
most important trends in their work.
● Cloud BI/data management, Data catalogs,
and Process mining were voted as the
least important trends.
BARC Research Study
● Master Data and Data Quality
Management build a strong foundation
for handling data.
● Data Discovery describes how
businesses collect data from various
sources and then apply it to generate
real business value.
BARC Research Study
● The significance of Data discovery
shows a strong trend in business
users empowerment.
● Data-driven culture depends on
greater inclusion of various business
departments.
● Data governance is an important
trend due to the GDPR and increase
in data security awareness.
BARC Research Study
Trends that have increased in importance
compared to the last year are:
● Data-driven culture
● Real-time analytics
● Integrated platforms for BI and PM
● Embedded BI
● Analytics Team/Data labs
BARC Research Study
● In Data-driven culture, all decisions and
processes are based on data and simple
key figures like revenue.
● Real-time analytics is about capturing
new data immediately after their
occurrence and processing them for
display or analysis.
● Embedding intelligence is growing its
popularity in operational applications.
BARC Research Study
● Mobile BI has grown by only 20 percent
in last eight years as the adoption was
very slow.
● Data catalogs and Process mining trends
are a little new in the market and has
only recently been attracting more
interest.
BI Software Challenges
BI Software Challenges
● A survey by BI_Survey.com indicated that
data quality is the top most problem for
BI software users since 2002.
● BI software output is largely and highly
impacted if the input data quality is not
good.
BI Software Challenges
Poor quality input data leads to bad output results provided by any BI tool.
BI companies compile data in a usable system without affecting the validity
of the original source.
Key Takeaways
Data visualization refers to the graphical representation of data
using charts, graphs, and maps.
Heat map, frequency distribution plot, and swarm plot are
commonly used visualizations.
Tableau, PowerBI, Datawrapper, and Sisense are some of the
data visualization tools.
Introduction to Data Analytics
Data Analytics, Data Science, and
Machine Learning
Learning Objectives
By the end of this lesson, you will be able to:
Define data science and machine learning
Differentiate between data science, machine learning, and
data analytics
Introduction to Data Science
Data Science
Data science is the study of data, which involves gathering, storing, analyzing, and
plotting data, to effectively extract useful information.
Aim: Gain meaningful insights from both structured and unstructured data.
Data Science
Preparation and
Data cleansing analysis
Trend forecast Machine learning
and data analytics
Types of Data Science
Data Science
Data Analytics Machine Learning Data Mining
Data Analytics
Data analytics is the process of examining and analyzing raw data sets to:
Draw conclusion Derive information
Derive insights from raw data
sources
Machine Learning
Learns from patterns in the past Predicts outcomes
using a set of algorithms accurately
Data Mining
● Data mining is the process of analyzing data from
different perspectives.
● It summarizes data into useful information.
● It helps increase revenue and cut costs.
Data Science, Data Analytics, and Machine Learning
Data Science and Data Analytics
Forecasts the future
based on past patterns
Data Scientist
Extracts meaningful insights
from various data sources
Data Analyst
Machine Learning
Machine learning creates systems that can learn from the data.
It is the ability of machines to predict outcomes based on patterns in the past.
Machine Learning
Leverages various algorithms to
train the machine
ML Engineer
Data Science and Machine Learning
Extracts useful information from
collected data sets Understands data from a
business point of view
Gathers data from various
sources
Provides accurate predictions to
improve key business decisions
Data Scientist
Understanding Data Science
Understanding Data Science
A data scientist combines both domain and technology perspectives.
Understanding Data Science
Works with data from video and
Knows multiple analytical
social media sources
functions
Data Scientist
Has a sound knowledge of technologies such as Python,
SAS, R, Scala, visualization libraries, SQL database, and
machine learning
Data Science: Process Flow
How car insurance costs less if
you pay bills on time?
Data scientists found that the people who
pay bill promptly are less prone to the
accidents
Data Science: Process Flow
Step 1: Data acquisition
Data scientists work with existing data
sets or gather them from various
sources.
Data acquisition
The most important part of the whole process is to have the correct data.
Data Science: Process Flow
Step 2: Data wrangling
● Choose the right tools from
Python, R, and SQL
● Derive a clean data set
Data acquisition
● Apply pick-and-shovel
algorithms
● Obtain meaningful data
Data wrangling
Data Science: Process Flow
Step 3: Machine learning
● Validate the model
● Perform necessary statistical analysis
Data acquisition ● Apply machine learning or recursive
analysis
● Run regression testing
● Compare results against other
techniques or sources
Machine learning
Data wrangling
Challenge of a Data Scientist
The most challenging part of being a data scientist is taking the results and presenting them to the
stakeholders in an easy and consumable manner.
Data Science and Business Strategy
Data Science and Business Strategy
Business owners used to measure their success based only on the Profit and Loss Statement.
Current era of technology leverages data science for efficient prediction on what will work.
Data Science and Business Strategy
The process flow of a data-driven decision-making process:
Define business Build a team of data
goals scientists
1 2
4 3
Identify data sources and
Design business
dashboards to track goals enable new sources of data
capture
Data Scientist: Asset to the Business
Empowers management Identifies and refines
the target audience
to make better decisions
Provides insights on
Identifies areas of
various KPIs and
improvement
parameters
Enables strategic changes Identifies opportunities
for better results
Data Scientist
Companies Using Data Science
Successful Companies Using Data Science
Few successful companies that use data science
Google Search Engine
Google Search Engine
Google uses data science to provide relevant search recommendations.
The influencing factors include:
● Query volume: unique and verifiable users
● Geographical locations
● Keyword or phrase matches on the web
● Scrubbing for inappropriate content
Facebook Tags
Facebook Tags
Facebook uses machine learning in every aspect including:
Scrolling the news feed Browsing images or videos
Facebook Tags
Uses clustering algorithm to:
Find mutual friends Send friend
suggestions
Alibaba
Alibaba’s Aliloan
Aliloan is an automated online system that provides flexible microloans to entrepreneurial
online vendors.
Alibaba’s Aliloan
Analyzes trading records Uses predictive models to
and evaluates risk analyze transaction records
Aliloan
Collects data from e- Determines merchants’
commerce platforms creditworthiness
Travel Industry
Travel Industry
Travel companies use datasets from social media, itineraries, predictive analytics, and location tracking
to arrive at the 360-degree view.
The sensors from different modes of transport provide real-time data on various parameters to predict
and prevent problems.
Travel Industry
Integrates historical data to Offers deals based on the user’s
ensure maximum yield preferences or recommended
local attractions
Predictive algorithms help drivers predict fuel needs, ETAs, and delays.
Retail
Retail
RFM analysis is a marketing technique that leverages data to determine the target customer.
Recency Frequency Monetary
Retailers use data science to segment customers into RFM groups and target marketing and
promotions.
E-Commerce
E-Commerce
Amazon is an e-commerce giant that leverages data science to the fullest extent.
Amazon prefers an everything under one roof model.
E-Commerce
E-commerce companies use data science to upsell through their websites.
Amazon’s People who viewed that product, also liked this functionality uses
sophisticated mining techniques and boosts business.
Crime Agencies
Crime Agencies
Analytics keeps crime in check by:
● Using identified patterns to derive
prediction techniques
● Analyzing previous data to prevent future
burglaries
Crime Agencies
● Data mining can help identify pattern in from
domestic violence to terrorism.
● Advanced analytics helps prevent crime by using
information from social media.
Crime Agencies
Crime prevention agencies use data science in
deciding:
● Where to deploy police manpower?
● Who to search at a border crossing?
● Which intelligence to consider in
counter-terrorism activities?
Analytical Platforms across Industries
Analytical Platforms across Industries
Data storage Tools
Algorithms Architectures
platforms
Analytical Platforms across Industries
Machine
Architectures Data storage Tools
learning
platforms
algorithms
Forecasting
Regression
Bayesian network
Vector autoregression
Analytical Platforms across Industries
Machine learning Deep learning Data storage Tools
algorithms architectures platforms
Deep Belief Network (DBN)
Convolutional Neural Network (CNN)
Recurrent Neural Network (RNN)
Analytical Platforms across Industries
Machine learning Deep learning Cloud storage Tools
algorithms architectures platforms
Amazon AWS
Microsoft Azure
Lambda
Analytical Platforms across Industries
Machine learning Deep learning Cloud storage
architectures platforms Tools
algorithms
Analytics tools
● Spark
● Python
● R
Reporting tools ● Apache Pig
● Tableau
● Splunk
● Power BI
● Kibana
Key Takeaways
Data science is the study of data, which involves gathering,
storing, analyzing, and plotting data, to effectively extract
useful information.
Data science is an umbrella that contains data analytics,
data mining, and machine learning.
Data science is used by many successful companies such as
Google, Facebook, and Alibaba.
Analytical platforms across industries include algorithms,
architecture, data storage platforms, and tools.
Introduction to Data Analytics
Data Science Methodology
Learning Objectives
By the end of this lesson, you will be able to:
Explain data science methodology
Describe the various stages of data science methodology
Data Science Methodology: Overview
Data Science Methodology
● A methodology is a process with a defined
input to achieve a defined output.
● It drives activities within a given domain and
does not depend on technologies or tools.
● Data science methodology is an iterative
methodology leveraged to produce repeatable
and successful results.
Stages of Data Science Methodology
Stages of Data Science Methodology
Source: https://www.ibmbigdatahub.com/blog/why-we-need-methodology-data-science
Stages of Data Science Methodology
Business understanding is the first stage of the data science methodology and lays the
foundation for a successful end result.
● This stage identifies key business sponsors, steering
committee, and internal sponsors.
Business
● It helps understand business and customer needs and Understanding
identify who needs the analytical solution.
● It includes defining the problem, project objectives, and
solution requirements from a business perspective.
Stages of Data Science Methodology
● The analytic approach determines business
requirements as well as data requirements.
Analytic Approach
● It identifies the analytic methods, hardware and
software, data content, formats, and representations to
be used.
Stages of Data Science Methodology
● The requirement stage is specific to identifying
necessary data with its initial source and appropriate
format.
Data
Requirements
● This stage has multiple sub-stages including data
acquisition, data wrangling, data analysis and data
modeling.
Stages of Data Science Methodology
● In collection stage, data scientists identify and gather
the available relevant data as a good quality input data
is required for a great output.
● Data scientists evaluate the volume and properties of
Data Collection
the data and understand the distribution of each
attribute.
● High-performance platforms and in-database analytic
functionality enable data scientists to use large data
sets.
Stages of Data Science Methodology
Data scientists use descriptive statistics and visualization
techniques to:
Data
● Understand data content Understanding
● Assess data quality
● Discover initial insights about the data
Stages of Data Science Methodology
● The data preparation stage includes activities to
construct a data set for data modeling.
● This stage includes cleaning of data, eliminating
duplicates, formatting data from multiple sources, and Data Preparation
transforming data into more useful variables.
● Data scientists are capable of creating explanatory
variables through a combination of domain knowledge
and existing structured variables.
Stages of Data Science Methodology
● The modeling stage applies predictive model on
historical data to obtain the outcome.
● This stage helps organizations gain intermediate
insights and future trends, leading to strategic
Modeling
improvements.
● Using exploratory data analytics, data scientists
attempt multiple algorithms to find the best model for
the available data set.
Stages of Data Science Methodology
● Once the model is developed, data scientists evaluate
the model to understand its quality and ensure that it
addresses the business problem.
● In model evaluation, diagnostic measures are
Evaluation
computed and outputs such as tables and graphs are
evaluated.
● During the evaluation phase, data mining result is
evaluated for novelty and usefulness.
Stages of Data Science Methodology
Review the whole evaluation process with the following steps:
Summarize activities that are missed
Ensure that the model is correctly built
Evaluation
Identify failures and misleading steps
Determine the plan of action based on findings
Analyze and estimate the potential for improvement
Stages of Data Science Methodology
● In the deployment stage, a satisfactory model should
be deployed into the production environment.
● It involves multiple groups, skills, and technologies. Deployment
● It requires planning on how knowledge can be
propagated to users.
Stages of Data Science Methodology
In this maintenance phase, identify: Deployment
● What could change in the environment?
● How will the accuracy be monitored?
● When should the data mining model not be used?
● Will business objectives change over time?
● What kind of report is required?
● Were initial data mining goals met?
Maintenance
● Who will be target groups for reports?
Stages of Data Science Methodology
In this last stage of feedback, review the whole framework by:
● Interviewing people involved in the project
● Interviewing end users and identifying improvement areas Feedback
● Summarizing the feedback and documenting the experience
Key Takeaways
Data science methodology is the process that drives
activities within a given domain.
The different stages of methodology: Business understanding,
analytical approach, data requirements, data collection, data
understanding, data preparation, modeling, evaluation,
deployment, and feedback.
Introduction to Data Analytics
Data Analytics in Different Sectors
Learning Objectives
By the end of this lesson, you will be able to:
Explain how top companies use analytics
Describe how Netflix uses analytics to drive engagement
Explain how analytics changed different sectors
Analytics for Products or Services
Analytics for Products or Services
Develop products Know how
customers want users engage
Product
Analytics
Analyze users’ likes Track users’ digital
and dislikes footprints
Highlights revealed behaviors to
help predict consumer demands
McKinsey: Research Study
“Companies that use customer analytics comprehensively report outstripping their competition
in terms of profit almost twice as often as companies that do not.”
Benefits of Analytics on Products or Services
● Customers buy more when they get what they
are looking for.
● Analytics helps product teams to dig deeper
and identify user needs that might not have
been captured otherwise.
● Example: Details associated with a shirt size,
how it will look on a customer, and which jeans
can be bought along with the shirt do make a
lot of sense for the buyer.
How Google Uses Analytics
How Google Uses Analytics
Google uses tools and techniques of data analytics to understand requirements based on several
parameters.
Frequency of Search phrases
sites visited used
Data
Timings
downloaded
It uses the collected data to streamline search
results.
How Google Uses Analytics
Businesses use data analytics while advertising through Google Ads.
Google Ads learns user’s preferences, likes, dislikes, and inclinations.
Based on the preferences, Google shows users tailored
advertisements.
How Google Uses Analytics
Self-driving cars comprehend situations and make educated
choices using data analytics.
How Google Uses Analytics
Google shows millions of results for a question ranked in a perfect
order.
It runs complex algorithms to match the query with all the available data and ranks the
results.
How Google Uses Analytics
Google uses data analytics to refine its core search and ad-serving algorithms and consider these
factors:
Words of Location and
search query settings
Relevance Expertise of
sources
Usability of pages
How Google Uses Analytics
Google Google Tag Manager
Analytics
Google Analytics 360
Google BigQuery
How Google Uses Analytics
Google Analytics 360 and Google Analytics enable you to collect data from
websites, mobile apps for iOS and Android, and from the custom data source.
Google Analytics 360 Google Analytics
How Google Uses Analytics
Google Cloud Service, which lets developers and businesses conduct interactive
analysis, uses Google BigQuery to become faster and cost-efficient.
Google BigQuery Google Tag Manager
Google Tag Manager is a simple, reliable, and easy to integrate tag management
solution that allows management of website tags without editing the code.
How LinkedIn Uses Analytics
How LinkedIn Applies Analytics
LinkedIn employs data analytics to revamp its job listings, who’s viewed your profile,
and who’s viewed your posts.
Analytics helps to bring and retain millions of new customers.
How LinkedIn Applies Analytics
LinkedIn identifies the connections, job postings, and skill sets for a user.
CONNECT
POST A JOB
SKILLS
50% of LinkedIn engagement comes from the Jobs you may be interested feature.
How Amazon Uses Analytics
How Amazon Uses Analytics
Gathers customer Tracks what they buy Compares products
data while they use and their shipping they browsed
the site address
1 2 3
Recommends Builds your profile with Suggests products
products based on available data bought by similar
needs customers
6 5 4
How Amazon Uses Analytics
Decides what you want to Offers you what similar
buy based on your profile profiles have purchased
Recommendation technology
Amazon regularly fine-tunes the recommendation engine by
collecting data from customers while they browse.
Disney’s Success Story
Disney’s Success
Disney uses data analytics in innovative ways to improve the customer
experience.
Technology improves customers’ experience and helps to retain them.
Disney’s Success
Disney World launched its MyMagicPlus program in 2013 where
each guest got a MagicBand equipped with RFID technology.
As guests swiped their bands at a ride, this info shipped real-time to the operations team which
allowed decisions to be made about adding staff or incentivizing guests to head to another ride or
attraction.
Netflix: Using Analytics to Drive Engagement
Netflix: Using Analytics to Drive Engagement
Netflix has 130 million+ worldwide streaming
subscribers.
It gathers a tremendous amount of data from these subscribers to make better
decisions on its streaming services.
Data analytics helps decide which programs will be of interest and the
recommendation system influences 80% of the content on Netflix.
Netflix: Using Analytics to Drive Engagement
Netflix ensures it has accurate algorithms for predicting and recommending
content.
In 2009, the company offered a $1 million prize to the group who came up with the
best algorithm for predicting how customers would like a movie based on previous
ratings.
Netflix: Using Analytics to Drive Success
Netflix: Using Analytics to Drive Success
Data analytics has helped Netflix massively in becoming the best online streaming
platform.
Netflix tracks these factors to identify user
preference:
● Ratings and watched movies
● Pause, rewind, and fast forward
● Day, date, and time
● Devices used
● Searches, browsing history, and scrolling behavior
● Volume, color, and scenery
Netflix: Using Analytics to Drive Success
If a user watch Fast and Furious on Friday, then a similar genre
movie will be displayed as a personalized recommendation for
Saturday.
Orange is the New Black and House of Cards are two examples
of how data analytics is used to understand its subscribers and
cater to their needs.
Netflix spent $100 million to buy House of Cards as it was
confident that the show would be a hit.
Netflix: Using Analytics to Drive Success
By analyzing its data for House of Cards, Netflix
realized:
● A significant percentage of its subscribers had
streamed director David Fincher’s work The Social
Network
● Films featuring Kevin Spacey were always
successful with its audience
● A successful show in Britain starring Kevin Spacey
and directed by David Fincher, for an American
audience, will be a big hit.
Netflix: Using Analytics to Drive Success
● Netflix has a Personal recommender system
that orders the entire collection in a
personalized way.
● It has a video-video similarity algorithm that
provides an estimate of what a user would like
to watch.
● Netflix is a perfect case study for those who
require an engaged audience to survive.
● Netflix’s approach to content is highly
successful as it renews 93 percent of its
original series.
Media and Entertainment Industry
Media and Entertainment Industry
Media and entertainment companies are in a unique position to leverage their data assets
for profitable customer engagement.
Data sources that help syndicate content
closely aligned to viewer preferences:
● Viewing history
● Searches, reviews, and ratings
● Location and device data
● Clickstreams and log files
● Social media sentiment
Media and Entertainment Industry
Gets insights into Provides personalized
audience behavior advertising
Pinpoints customer Makes useful
drawbacks recommendations
Data Analytics
Education Industry
Education Industry
● Data analytics is used from kindergarten
to doctoral level.
● Teachers monitor pupils’ performance
using data analytics and get real-time
information on what has been learned.
Education Industry
Schools use data analytics to:
Meet education capability Analyze educational need and
and requirements place them at right level
Create tuition Incorporate
system adaptive learning
Frame course material Improve curriculum using
software programs
Quiz the student and
receive immediate
feedback
Education Industry
Data analytics system helps:
● Detect and match grades
● Compare a student’s score with field
requirement
● Pull academic, attendance, financial,
disciplinary, and engagement data
● Enhance student experience by changing the
course of student’s learning
Education Industry
It is important to have a system that can advise students for best career
paths based on their strengths and weaknesses.
“Everybody is a genius. But if you judge a fish
by its ability to climb a tree, it will live its
whole life believing that it is stupid.”
—Albert Einstein
Education Industry
IBM has its own project that has been using analytics and helping schools
succeed.
These universities use data analytics to help students by extracting data to
monitor and predict their performance.
Healthcare Industry
Healthcare Industry
Healthcare industry is one of the most promising areas where data analytics can be
applied.
Reduces costs of Predicts outbreaks
treatment of epidemics
Avoids preventable Improves quality of
Data Analytics
diseases life
Healthcare Industry
Healthcare business intelligence helps doctors make data-driven decisions and improve patients’
treatment
Example: Data analytics tools can predict potential diabetes patients and can advise preventive
measures.
Healthcare Industry
● Doctors use data analytics to understand the
health issues of a patient as early as possible.
● This helps them identify warning signs of
serious illness and reduce the treatment cost.
Healthcare Industry
● Clinical Decision System software analyzes
medical data and provides advice to doctors
on prescriptive decisions.
● Healthcare wearables help collect patients’
data continuously and stores this data on
cloud.
● In case of any abnormality in health data,
system sends an alert to the doctor in real
time.
Government
Government
● Government makes vital decisions based
on the information received.
● It is difficult to verify the information and
faulty data can have negative
consequences.
● Government use data analytics to help in
welfare schemes & cybersecurity.
Government: Welfare Schemes
● Government accesses information
relevant to their programs and policies.
● Data analytics platform allows
government to pinpoint areas that need
attention.
● It allows the government to make
decisions faster, monitor those decisions,
and quickly enact changes.
Government: Welfare Schemes
Data analytics helps track and monitor land and livestock in a country to manage
and support farmers and their resources.
Online talent platforms fill traditional jobs quickly by finding the right candidates
for jobs and shorten the duration of unemployment.
Government: Cybersecurity
● Government uses data analytics in
real-time crime mapping, predictive
policing, and catching tax evaders.
● Data analytics is used in
cybersecurity for deceit recognition.
● It helps cyber analysts predict and
avoid the possibility of intrusion and
invasion.
Government: Cybersecurity
A program called Project Insight tracks social media profiles of people and
expenditure patterns through photographs and videos uploaded on social media.
If purchases and travel expenses are disproportionate to the declared income, IT
officials would be informed of the mismatch and actions would follow.
Weather Forecasting
Weather Forecasting
● Data analytics predicts natural calamity
and helps take actions in advance.
● Data needed for weather forecasting:
○ Barometric pressure
○ Wind speed
○ Precipitation
○ Temperature
○ Humidity
● Experts use predictive analytics to
strategize and help combat global
warming.
Weather Forecasting
● Data analytics helps identify natural
disaster patterns by collecting data on
road condition and rainfall in a year.
● Local authorities use analytics tools to
better anticipate problems caused by
weather.
● It helps make plans to upgrade existing
facilities and predict the availability of
usable water around the world.
IBM Deep Thunder
IBM Deep Thunder is a research project
that:
● Provides weather forecasting
● Differs from other weather forecasting
systems
● Provides forecasts for extremely specific
locations
IBM is assisting Tokyo with improved forecasting for natural disasters to plan a successful 2020
Olympics.
IBM Deep Thunder
Deep Thunder can provide information
about:
● Severe flood areas
● Tropical storm directions
● Snow or rainfall areas
● Downed power line locations
● Windy areas
● Damaged bridges and roads
● Cancelled flights at specific airports
Key Takeaways
Data analytics helps businesses uncover valuable insights
and increase efficiency.
Data analytics helped companies like EY, Google, LinkedIn,
Amazon, Disney, and Netflix grow their businesses.
Data analytics is used in different sectors such as media,
education, healthcare, government, and weather
forecasting.
Introduction to Data Analytics
Analytics Framework Case Study and
Upcoming Trends
Learning Objectives
By the end of this lesson, you will be able to:
Explain the customer analytics framework
Explain the phases of customer analytics framework
List the latest trends in data analytics
Case Study: Ernst & Young
Customer Analytics Framework
An analytics framework helps perform data analysis in an organized manner.
The framework allows you to focus on the business outcome.
Case Study: EY
EY created a customer analytics framework for personalized customer experiences
to win more business and drive loyalty in a digital world.
Case Study: EY
To create the customer analytics
framework, company considered
these factors:
● Who are your customers?
● What do they do?
● What do they want?
● How and when to reach them?
Customer Analytics Framework
Phases of Customer Analytics framework
Data
Modeling
understanding
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
Business Data Model
needs preparation monitoring
Business Needs
Business Needs
Grow Optimize Protect
● Acquire new customers ● Optimize pricing and ● Understand how to
cost to enhance retain customers
● Understand product life customer satisfaction
cycle ● Perform sentiment
analysis
● Develop new products
Data Understanding
Data Understanding
Data understanding is the second stage in the customer analytics framework.
This stage helps to draw patterns by gaining insight from the data.
Data Understanding
Data understanding is highly investigative and diagnostic.
Companies look at customer They identify the most valuable
needs and priorities to attract customers who aid their
potential customers. growth.
Data Understanding
Perform market segmentation for effective marketing and customer engagement by
dividing customers into groups based on:
Age Gender
Interests Availability
Spending habits
Data Understanding
Sentiment analysis is important to identify the sentiments of the customer through
social media.
Influence score measures the degree of influence of each user.
It can be combined with sentiment measure to identify disgruntled customers.
Data Preparation
Data Preparation
Take transformation Determine the data
steps
Collect and consolidate Improve data quality
data and completeness
Standardize data
structure
Data Preparation
Perform data mining
Work with structured and unstructured data
Use various tools and software to transform data.
Integrate data from various sources
Modeling
Modeling
Modeling stage focuses on developing models and can be based on:
Predictive Prescriptive
analytics analytics
Modeling
Predictive Prescriptive
analytics analytics
It helps understand the It helps predict possible
future and answer What outcomes and answer
could happen? What should we do?
As this phase is iterative, revisiting data preparation phase to refine the data is needed.
Modeling
Price
Attrition
Optimization
Model
Model
● Helps calculate how demand varies ● Is created through predictive
at different price levels algorithms
● Uses data to recommend prices for ● Helps companies gain better
improving profits understanding and take preventive
measures for employee attrition
Models can also be made for web analytics and sentiment analysis.
Modeling
Types of training models
Static Dynamic
This model is trained offline This model is trained online
as the model is trained once as the data is fed into the
and used for a while. model to train continuously.
Modeling
Types of predictions from trained models:
Online Batch
It is also called HTTP It is used for processing
prediction and is used accumulated data when
when timely inference is immediate results are
needed. not needed.
Model Monitoring
Model Monitoring
Model monitoring is the final stage where you need to establish, monitor, and meet
service-level agreements.
Example: SLAs for analytics might be the maximum time taken to create or deploy a
model.
Model Monitoring
● Data scientists monitor machine learning
models for drift.
● Drift means the data is no longer relevant
or useful as data is always changing.
● Data scientists ensure that the model
inputs look similar to those used in
training.
Factors in Model Monitoring
Cost Latency Throughput
Model cost needs to It is the delay between the It is the amount of data
analyzed to check whether data transfer instruction successfully moved from
the value generated from and the actual data transfer. one place to another in a
the model is worth the cost. given time period.
Latest Trends in Data Analytics
Latest Trends in Data Analytics
Cognitive Augmented Graph
Computing Reality Analytics
Automated Open
Machine Learning Source AI
Cognitive Computing
Cognitive Computing
Cognitive computing is an advanced type of artificial intelligence in the
cybersecurity domain.
It uses machine learning algorithms and deep learning networks to learn from
human interactions and provides actionable insights.
Augmented Reality
Augmented Reality
According to Gartner Inc., augmented analytics will be the dominant driver of new
purchases of business intelligence and analytics by 2020.
Augmented Reality
Preparing data
Building models
Augmented Analyzing data
analytics
Graph Analytics
Graph Analytics
Graph analytics is also known as network analytics and uses graphs to analyze data.
It is used for detecting crimes, spotting frauds, and applying influencer analysis in
social network communities.
Graph Analytics
Graph analytics highlights dominant edges.
Example: A large number of payments between bank accounts may indicate a
money laundering activity.
Automated Machine Learning
Automated Machine Learning
Machine
Learning Model
Domain knowledge Computer science Skills
Mathematical expertise
It involves a lot of tasks and it is prone to human errors and bias.
Automated Machine Learning
Automated Machine Learning
Enables organizations Helps improve return Reduces the amount of
to use existing on investment time taken to capture
knowledge value
Automated Machine Learning
Automated Machine Learning
Accelerates the Gives power to Delivers the right Exposes the
process of evolving a business users level of same degree of
trained model customization flexibility
Open Source AI
Open Source AI
Open source software has produced iconic innovations like the Firefox web browser,
Apache server software, and the Linux OS.
In open source AI, AI software libraries and algorithms are freely available to developers
and entrepreneurs.
Open Source AI
Many cloud-based technologies have their roots in open-source projects.
AI is expected to follow the trend as companies seek collaboration and knowledge sharing.
Key Takeaways
Customer analytics framework helps perform data analysis
in an organized way and allows to focus on the business
outcome.
Business needs, data understanding, data preparation,
modeling, and model monitoring are the different phases of
the analytics framework.
Cognitive computing, augmented reality, graph analytics,
automated machine learning, and open source AI are some
of the latest trends.