KEMBAR78
Unit-2 Data Analytics | PDF | Analytics | Data Analysis
0% found this document useful (0 votes)
9 views119 pages

Unit-2 Data Analytics

The document provides an overview of data analytics, covering its importance, techniques, tools, and applications in business. It highlights the role of data analysts, different types of analytics, and the necessity for specific skills and tools to extract insights from data. Additionally, it discusses the significance of business modeling and big data analytics in optimizing operations and decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views119 pages

Unit-2 Data Analytics

The document provides an overview of data analytics, covering its importance, techniques, tools, and applications in business. It highlights the role of data analysts, different types of analytics, and the necessity for specific skills and tools to extract insights from data. Additionally, it discusses the significance of business modeling and big data analytics in optimizing operations and decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 119

Data Analytics

Unit-2
Data Analytics: Introduction to Analytics,
Introduction to Tools and Environment,
Application of Modeling in Business,
Databases & Types of Data and Variables,
D a ta M o d e l i n g Te c h n i q u e s , M i s s i n g
I m p u tat i o n s etc . , N e e d fo r B u s i n e s s
Modeling.
Introduction to Analytics
• In today's data-driven world, enormous
amounts of data are generated daily from
various sources, such as social media,
business transactions, and online activities.
• Extracting meaningful insights from this data
has become essential for individuals and
organizations to make informed decisions.
• Data Analytics plays a vital role in
identifying patterns, improving operations,
and driving success.
Introduction to Analytics
• 4 main factors which signify the
need for Data Analytics are:
i. Gather Hidden Insights
ii. Generate Reports
iii. Perform Market Analysis
iv. Improve Business Requirement
Introduction to Analytics
i. Gather Hidden Insights:
• Data often holds valuable information that
is not immediately visible.
• By analyzing data, we can uncover patterns
and insights that help solve problems or
identify opportunities and make strategic
d e c i s i o n s .
Example: A streaming platform like Netflix
analyzes user viewing patterns to
recommend shows or movies.
Introduction to Analytics
ii. Generate Reports:
• Re p o r t s p re s e nt a n a l yze d d ata i n a
structured manner, helping organizations
and teams make better decisions.
Example: Schools can generate reports from
student performance data to identify
subjects where students need additional
support.
Introduction to Analytics
iii. Perform Market Analysis:
• Anal yzi ng mar ket t re n d s h e l ps
organizations understand customer
preferences and stay competitive.
Example: A smartphone company
analyzes market trends to decide
which features to prioritize in its next
release.
Introduction to Analytics
i v. I m p r o v e R e q u i r e m e n t s a n d
Experience:
• U n d e rsta n d i n g c u sto m e r o r u s e r
behavior through data analytics allows
for better ser vices and improved
experiences.
Example: E-commerce platforms analyze
customer purchase patterns to suggest
personalized product recommendations.
Introduction to Analytics
i v. I m p r o v e R e q u i r e m e n t s a n d
Experience:
• U n d e rsta n d i n g c u sto m e r o r u s e r
behavior through data analytics allows
for better ser vices and improved
experiences.
Example: E-commerce platforms analyze
customer purchase patterns to suggest
personalized product recommendations.
Introduction to Analytics
Data Analytics:
• It involves techniques for analyzing data
to enhance productivity and achieve
business gains.
• Data is extracted from various sources,
cleaned, categorized, and analyzed to
uncover behavioral patterns and trends.
• The techniques and tools vary based on
organizational needs.
Introduction to Analytics
Data Analytics:
• It involves techniques for analyzing data
to enhance productivity and achieve
business gains.
• Data is extracted from various sources,
cleaned, categorized, and analyzed to
uncover behavioral patterns and trends.
• The techniques and tools vary based on
organizational needs.
Introduction to Analytics
Common Techniques:
i. Data Mining: Extracting patterns from large datasets.
ii. Statistical Analysis: Applying mathematical models to
analyze data.
iii.Predictive Analytics: Using historical data to predict
future trends.
iv.Machine Learning: Automating data analysis to
discover insights.
Example: A bank may use machine learning models to
predict customer churn and identify clients who are
likely to leave the bank. By offering targeted promotions,
the bank can retain these customers.
Role of Data Analysts
• Data Analysts play a crucial role in transforming data
into valuable insights. They collect, process, and
analyze data, then present their findings in reports or
dashboards that help decision-makers.
Example Workflow:
• Collect Data: Gather information from various
sources, such as databases or surveys.
• Clean Data: Remove duplicates and errors to ensure
data accuracy.
• Analyze Data: Use tools and techniques to find
patterns.
• Generate Reports: Present insights through charts,
tables, and written summaries.
Fig. Data Analytics
https://www.wallstreetmojo.com/data-analytics/#what-is-data-analytics
Types of Analytics and Human
Knowledge Involvement

i. Descriptive Analytics
ii. Diagnostic Analytics
iii. Predictive Analytics
iv. Prescriptive Analytics
v. Cognitive Analytics
Fig. Data and Human Knowledge Involvement
https://www.sv-europe.com/blog/10-reasons-organisation-ready-prescriptive-analytics/
Types of Analytics and Human Knowledge
Involvement
i. Descriptive Analytics: Provides an understanding of past
data and helps answer "what happened?"
Example: Monthly sales reports showing revenue trends.
• Human Input: High human interpretation is required
to summarize the data and understand its context.
ii. Diagnostic Analytics: Examines data to determine the
causes of events and answer "Why did it happen?"
Example: Identifying why sales dropped by analyzing
customer feedback, marketing campaigns, and competitor
actions.
• Human Input: Moderate, as analysts must interpret
correlations and identify root causes.
Types of Analytics and Human Knowledge
Involvement
iii. Predictive Analytics: Predicts future outcomes based on
h i s t o r i c a l d a t a .
Example: Forecasting demand for seasonal products.
• Human Input: Less human intervention is needed;
algorithms handle most of the prediction tasks.
iv. Prescriptive Analytics: Provides recommendations for
optimal decision-making.
Example: A logistics company may use prescriptive analytics
to determine the most efficient delivery routes.
• Human Input: Minimal or no human input is required, as
automated systems handle decision-making.
Types of Analytics and Human Knowledge
Involvement
V. Cognitive Analytics: Mimics human thought processes to
analyze data and provide insights. It combines artificial
intelligence, machine learning, and natural language
processing.
Example: A virtual assistant like Siri or Alexa analyzing user
requests and providing relevant information.
• Human Input: Very minimal, as cognitive systems
operate autonomously and learn from data over
time.
Introduction to Analytics
• Data Analytics has become essential for
businesses to stay competitive and thrive in
the data-driven world.
• Understanding the different types of
analytics and how they require varying
levels of human knowledge can help
organizations make better decisions and
achieve operational excellence.
Introduction to Analytics

https://uwex.wisconsin.edu/stories-news/data-science-vs-data-analytics/
Introduction to Tools and Environment
Data Analytics typically involves three main
components:
i. Subject Knowledge: Understanding the business
or field where the analysis is being applied (e.g.,
healthcare, marketing, or education).
ii. Statistical Knowledge: Applying mathematical
techniques to analyze data and draw meaningful
conclusions.
iii.Te c h n i c a l K n o w l e d g e : U s i n g t o o l s a n d
programming languages to clean, analyze, and
visualize data effectively.
https://www.wallstreetmojo.com/data-analytics/#what-is-data-analytics
Introduction to Tools and Environment

https://www.wallstreetmojo.com/data-analytics/#what-is-data-analytics
Introduction to Tools and Environment
• A Data Analyst must be proficient in all
three areas to generate valuable
insights for businesses.

• With the increasing demand for Data


Analytics in the market, many tools have
emerged with various functionalities for
this purpose.
Introduction to Tools and Environment
• Either open-source or user-friendly, some of the
popular tools and environments used in Data
Analytics are:
i. R Programming
ii. Python
iii. Tableau Public
iv. QlikView
v. SAS
vi. Microsoft Excel
vii. RapidMiner
viii.KNIME (Konstanz Information Miner)
ix. OpenRefine
x. Apache Spark
Introduction to Tools and Environment
i. R Programming
R is a leading tool for statistical computing and data
modeling. It is highly flexible and supports data
visualization, machine learning, and reporting.
Example: A researcher uses R to analyze survey
data and visualize results in bar charts and
heatmaps.
Key Features:
• Compatible with Windows, Mac, and UNIX
systems.
• Allows automatic installation of user-required
packages like "ggplot2" for visualizations.
Introduction to Tools and Environment
ii. Python
P y t h o n i s a n o p e n - s o u rc e , o b j e c t - o r i e nte d
programming language that is easy to read, write,
and maintain. It is one of the most popular tools for
data analytics and machine learning.
Example: A data analyst uses Python's Pandas
library to clean messy e-commerce sales data and
uses Matplotlib to create sales trend charts.
Key Libraries:
• Scikit-learn for machine learning.
• TensorFlow and Keras for deep learning.
• Matplotlib and Seaborn for data visualization.
Introduction to Tools and Environment
iii. Tableau Public
Tableau Public is a free data visualization tool that
connects to various data sources and allows users
to create interactive dashboards.
Example: A marketing team uses Tableau Public to
create a dashboard showing customer purchase
trends and sales performance over time.
Key Features:
• Real-time data updates.
• Ability to publish dashboards on the web for
easy sharing.
Introduction to Tools and Environment
iv. QlikView
QlikView provides fast, in-memory data
processing and visualization capabilities.
Example: A retail company uses QlikView to
quickly analyze sales data and identify which
products are performing best.
Key Features:
• Data compression for faster processing.
• Dynamic visualizations.
Introduction to Tools and Environment
v. SAS
SAS is a powerful programming language and
environment for data manipulation and
analytics.
Example: A finance analyst uses SAS to
forecast market trends based on historical
stock price data.
Key Features:
• Access to data from multiple sources.
• Comprehensive statistical tools.
Introduction to Tools and Environment
vi. Microsoft Excel
Excel is one of the most widely used data
analytics tools, particularly for smaller
datasets.
Example: A small business owner uses Excel
pivot tables to summarize sales data and
identify the best-performing products.
Key Features:
• Easy data summarization with pivot
tables.
• Basic data visualization capabilities.
Introduction to Tools and Environment
vii. RapidMiner
RapidMiner is a comprehensive platform for
predictive analytics, machine learning, and
text analytics.
E xa m p l e : A s o c i a l m e d i a a n a l y st u s e s
RapidMiner to analyze user sentiment from
tweets and predict trending topics.
Key Features:
• Integration with various data sources
like Excel and SQL databases.
Introduction to Tools and Environment
viii. KNIME (Konstanz Information Miner)
KNIME is an open-source data analytics
platform with visual programming.
E xa m p l e : A re s e a rc h e r u s e s K N I M E to
preprocess large datasets and create a
machine-learning model for predicting
disease outcomes.
Key Features:
• Drag-and-drop functionality for data
workflows.
• Easy integration with other tools.
Introduction to Tools and Environment
ix. OpenRefine
OpenRefine (previously Google Refine) is a
data cleaning tool used for transforming
messy data.
Example: A data analyst uses OpenRefine to
clean inconsistent product names in an e-
commerce dataset.
Key Features:
• Data transformation.
• Parsing data from websites.
Introduction to Tools and Environment
x. Apache Spark
Apache Spark is a large-scale data processing
engine, often used in big data applications.
Example: A data engineer uses Apache Spark
to process and analyze massive amounts of
streaming data from social media in real time.
Key Features:
• Faster processing in memory (100 times
faster than Hadoop).
• Machine learning model development.
Introduction to Tools and Environment
Skills for Data Analysts
Apart from knowing the tools, a Data Analyst
should also develop the following skills:
i. Statistics
ii. Data Cleaning
iii. Exploratory Data Analysis (EDA)
iv. Data Visualization
v. Machine Learning Knowledge (Optional)
Skills for Data Analysts
i. Statistics: Understanding and applying statistical
techniques to draw meaningful insights.
Example: Calculating the average customer
spending from sales data.
ii. Data Cleaning: Ensuring data is accurate and
consistent before analysis.
Example: Removing duplicate records from a
customer database.
iii. Exploratory Data Analysis (EDA): Understanding
patterns, trends, and relationships within the data.
Example: Identifying seasonal trends in sales
data.
Skills for Data Analysts
iv. Data Visualization: Presenting data in an
easily understandable format.
Example: Creating bar charts and line
graphs to represent sales trends.
v. Machine Learning Knowledge (Optional):
Enhancing data analytics capabilities by
building predictive models.
Example: Predicting customer churn using
machine learning models.
Introduction to Tools and Environment
• Understanding these tools and developing
essential skills will empower students to
become effective data analysts.
• By mastering data cleaning, visualization,
and statistical techniques, analysts can turn
raw data into meaningful insights and
contribute to decision-making in various
fields.
Application of Modeling in Business
• Business modeling helps
organizations understand how they
operate, create value, and make
decisions efficiently.
• With proper models in place,
businesses can predict trends,
o pt i m i ze re s o u rc e s , a n d ga i n
competitive advantages.
Application of Modeling in Business
Why Business Modeling is Important:
i. Strategic Decision-Making: Helps
executives make informed decisions
by analyzing future risks and
opportunities.
ii. R e s o u r c e A l l o c a t i o n : E n s u r e s
efficient use of financial, human, and
technological resources.
Application of Modeling in Business
Why Business Modeling is Important:
iii.Performance Measurement: Evaluates
b u s i n e s s p e r fo r m a n c e t o i m p ro v e
efficiency.
iv. Innovation: Encourages exploration of
new business ideas and models.
v. Market Forecasting: Helps predict
market trends and customer behavior.
Application of Modeling in Business
Why Business Modeling is Important:

Example: A retail company might use


sales forecasting models to predict
demand during festive seasons, which
ensures the right inventory is stocked,
avoiding overproduction or shortages.
Big Data Analytics in Business
• Big data analytics involves processing vast
amounts of structured and unstructured
data to uncover trends, patterns, and
valuable business insights.
Key Applications:
i. Customer Behavior Analysis
ii. Risk Management
iii. Marketing Campaigns
iv. Supply Chain Optimization
Big Data Analytics in Business
Key Applications:
i. Customer Behavior Analysis: Understand
customer preferences to offer personalized
products.
ii. Risk Management: Identify fraudulent
transactions and financial risks.
iii. Marketing Campaigns: Analyze social media
and website data to optimize campaigns.
iv. S u p p l y C h a i n O p t i m i z a t i o n : E n s u r e
smoother and more cost-effective logistics
operations.
Big Data Analytics in Business
Example:
• Social Media Analytics: Companies like
Facebook and Twitter analyze user
interactions to assess the impact of
advertising campaigns and identify customer
sentiment toward products.
• E-Commerce Giants (Amazon and eBay):
They examine customer purchasing patterns,
analyze browsing behavior, and predict
factors that influence user interactions and
sales revenue.
Big Data Analytics Framework Using Hadoop
Hadoop Framework: Hadoop is an open-
s o u r c e p l a t fo r m u s e d fo r d i s t r i b u t e d
processing of large data sets.

• It has three main steps:


i. Map() Step: Splits input data into smaller
parts and assigns tasks to worker nodes to
generate key-value outputs.
Example: Breaking down a customer
review dataset into individual review
sentences.
Big Data Analytics Framework Using Hadoop
ii. Shuffle() Step: Combines similar key-value
pairs from different worker nodes for further
processing.
Example: Collecting all mentions of a
product name together from different nodes.
iii. Reduce() Step: Processes the grouped data
to produce the final results.
Example: Counting how many times a
specific product feature is mentioned in
reviews.
Big Data Analytics Framework Using Hadoop
Classic Example (Word Count Problem):

• Imagine counting the number of


occurrences of each word in a large
document.
• The MapReduce algorithm splits the
document, counts individual words (map),
groups similar words (shuffle), and finally
aggregates the counts for each word
(reduce).
Big Data Analytics Framework Using Hadoop

MapReduce Example: Word Count Problem


Data-Driven Companies Using Big Data Analytics
• IBM and Microsoft: Both companies offer
cloud-based big data solutions. IBM provides
analytics in business intelligence and
healthcare.
• So c ia l M e d i a P l at fo r m s ( Fa c e b o o k a n d
Twitter): Analyze user profiles and interactions
to target ads and increase their revenue.
• Healthcare Example: A hospital using big data
analytics can predict patient admission rates,
optimize staffing, and improve patient care
outcomes.
Three V's of Big Data:
• Volume: Refers to the large size of data (e.g.,
millions of customer transactions).
• Variety: Different types of data (text, images,
videos).
• Velocity: Speed at which data is generated
and processed (real-time social media
analytics).
Example: Netflix uses these three V's to
analyze user behavior, predict preferences,
and suggest personalized content in real time.
Databases
• A Database i s an o rgani zed co l l ec ti o n o f
structured information or data stored
electronically on a computer.
• It is managed by a Database Management
System (DBMS) that allows users to access,
manipulate, and manage the data.
Categories of Databases:
i. Text Databases
ii. Desktop Databases
iii.Relational Databases (RDB)
iv.NoSQL Databases
v. Object-Oriented Databases (OODB)
Databases
i. Text Databases:
• Stores large collections of text.
Examples: Textbooks, magazines, journals,
and manuals.
ii. Desktop Databases:
• Designed for use on a single PC.
• T h ey a re s i m p l e r a n d h ave l i m i te d
functionality compared to large-scale
database systems.
Examples: Microsoft Excel, Microsoft
Access, etc
Databases
iii. Relational Databases (RDB):
• Store data in tables, where each table has
rows and columns.
• Tables can share information, making data
searchable and organized.
Examples: SQL, Oracle, Db2, DbaaS.

iv. NoSQL Databases:


• Non-tabular and stores data differently
than relational tables.
Databases
• Types: Document, key-value, wide-
column, and graph databases.
Examples: MongoDB, CouchDB, JSON.

v. Object-Oriented Databases (OODB):


• S to re d ata a s o b j e c t s a n d c l a s s e s
following object-oriented programming
principles.
Examples: Java, C++, Smalltalk, LISP.
Types of Data and Variables
•When we work with data in
databases, we need to understand
the types of data and variables.
•In relational databases, rows
represent data (records) and
c o l u m n s re p re s e nt att r i b u te s
(characteristics).
Types of Data and Variables
Big Data Representation
• In big data, columns from RDBMS
are referred to as attributes or
variables.
• The variable can be categorized into
two types:
i. Categorical (Qualitative) Data
ii. Quantitative Data (Discrete or
Continuous)
Big Data Representation
i. Categorical (Qualitative) Data: Data
represented by characters or labels,
rather than numbers.
Types of Categorical Data:
a. Nominal Data
b. Ordinal Data
Big Data Representation
a . N o m i n a l D ata : N o n at u ra l o rd e r o r
s e q u e n c e .
Examples: Color, Gender, Names of animals.
E.g.: Arranging the gender of 50 students has
no specific order.
b. Ordinal Data: Has a natural order or
s e q u e n c e .
Examples: Clothing sizes (S, M , L , XL),
Customer Ratings (Excellent, Good, Bad).
E.g.: Clothing sizes follow a clear increasing
order, giving valuable insights.
Big Data Representation
ii. Quantitative Data (Discrete or
Continuous): Data that can be
measured and represented
numerically.
Types of Quantitative Data:
a. Discrete Data
b. Continuous Data
Big Data Representation
a. Discrete Data: Finite countable values
( w h o l e n u m b e r s ) .
Examples: Number of buttons, Delivery days
for a product.
E.g.: Number of customer orders recorded
each week.
b. Continuous Data: Infinite values within a
ra n ge ( i n c l u d i n g f ra c t i o n a l n u m b e rs ) .
Examples:Price, Height, Weight, Temperature.
E.g.: Tracking changes in product prices over
time.
Types of Data and Variables
• By understanding the types of databases
and data categories, you can effectively
store and analyze information in various
systems.
• Focus on recognizing whether your data
belongs to a categorical or quantitative
group and choose appropriate analysis
techniques accordingly.
Data Modeling Techniques
• Data modeling is the process of
structuring and organizing data in a
database so that it can be stored
efficiently and used for analysis.
• It helps businesses and organizations
m a ke d ata - d r i ve n d e c i s i o n s b y
providing a well-defined structure for
data storage and retrieval.
https://www.klipfolio.com/blog/6-data-modeling-techniques
Data Modeling Techniques
For example: a university database
stores information about students,
courses, and professors and marks
need to be stored systematically.
• A good data model ensures that
a l l t h i s i n fo r m at i o n i s w e l l -
organized and easy to retrieve
when needed.
Types of Data Models
• Data modeling can be achieved in various
ways.
• However, the basic concept of each of them
remains the same.
i. Hierarchical model
• Data is structured in a tree-like
hierarchy, where each parent node
can have multiple child nodes, but
each child has only one parent.
i. Hierarchical model
Example 1: Library Management
System
i. Hierarchical model
• Each book belongs to only one
category (parent).
• Retrieving a book’s details requires
searching through the hierarchy.
Limitations:
• Data retrieval is difficult.
• If a book needs to be moved from
one category to another category,
it is complex to update.
i. Hierarchical model
Example 2:

Fig 2. Hierarchical Model Structure


ii. Relational Data Modeling Technique
• Data is organized into tables
(relations), with rows (records)
and columns (attributes).

Example: Online Shopping


Database (Amazon, Flipkart, etc.)
ii. Relational Data Modeling Technique

Fig. Relational Model


Structure

•Customer_ID, Product_ID, Vendor_ID are the a


foreign key ’s in the Customes, Products and
Vendors Tables, linking it to the Sales Tables.
ii. Relational Data Modeling Technique

Advantages:
• Reduces data duplication
• Easy to update and retrieve data
• Simple to understand
relationships
iii. Network Data Modeling Technique
• Data is represented using nodes
(entities) and edges
(relationships).
• Unlike hierarchical models, an
entity can have multiple parents.
Example 1: Hospital Management
System
iii. Network Data Modeling Technique
• A hospital database where
patients can visit multiple
doctors, and doctors can have
multiple patients.
iii. Network Data Modeling Technique
Example 2:

Fig. Network
Model
Structure
iii. Network Data Modeling Technique
Example 3:

Fig. Network Model Structure


iii. Network Data Modeling Technique

Advantages:
• More flexible than hierarchical
models
• Best for complex relationships
iv. Entity-Relationship (ER) Data
Modeling Technique
• Uses diagrams to represent entities,
attributes, and relationships in a
database.
Example: College Student Database

Example Scenario: A college stores


information about students and courses
iv. Entity-Relationship (ER) Data
Modeling Technique
“ER Diagram” for college student database:
iv. Entity-Relationship (ER) Data
Modeling Technique
Advantages:
• Visual representation makes it
easy to understand.
• Helps design database structure
efficiently.
v. Object-Oriented Data Modeling
Technique
• Data is stored as objects, similar
to how data is managed in
object-oriented programming.

Example: Social Media Application


(Instagram, Facebook, etc.).
v. Object-Oriented Data Modeling
Technique
v. Object-Oriented Data Modeling
Technique
Advantages:
•B e s t f o r r e a l - w o r l d
applications.
•Handles complex data
structures easily.
Types of Data Models- Example for “shopping
database”
Importance of Data Modeling
• Avoids data redundancy (no
unnecessary data repetition)
• Ensures data consistency (all data is
accurate and reliable)
• Makes data retrieval efficient
(faster searches and queries)
• Helps in business decision-making
(better data insights)
Best Practices for Data Modeling
• Understand Business Needs- Before
designing a data model, understand the
business goals and data requirements.
• Keep Models Simple- Start with a small
model and scale as needed.
• Organize Data Using Facts &
Dimensions- Use separate tables for
Facts (e.g., sales data) and Dimensions
(e.g., product categories, customer
location) for better insights.
Best Practices for Data Modeling
• Avoid Storing Unnecessary Data-
Storing too much data affects
p e r fo r m a n c e . O n l y ke e p w h a t i s
required.
• Continuously Validate & Update- Data
models must be regularly updated as
business requirements evolve.
Missing Imputations
• Missing data can be a major
problem in data analysis, as it may
lead to incorrect conclusions.
• Imputation is the process of
replacing missing values with
substituted values to ensure a
complete dataset.
Imputation Methods
Different imputation methods are:
i. Do Nothing (Ignore Missing Data)
ii. M e a n , M e d i a n , o r M o d e
Imputation
iii.Imputation Using Most Frequent /
Zero / Constant Values
iv.K - N e a r e s t N e i g h b o r s ( K N N )
Imputation
i. Do Nothing (Ignore Missing Data)
•In this approach, missing data is
left as it is.
•This is often used when missing
values are minimal, or when the
dataset is large enough that
missing values do not
significantly impact results.
i. Do Nothing (Ignore Missing Data)
Example: If a dataset has only 2% missing
values, ignoring them may not affect the
analysis.
S.No. Column 1 Column 2 Column 3
1 3 6 NaN
2 5 10 12
3 6 11 15
4 NaN 12 14
5 6 NaN NaN
6 10 13 16
i. Do Nothing (Ignore Missing Data)
Example: If a dataset has only 2% missing
values, ignoring them may not affect the
analysis.
S.No. Column 1 Column 2 Column 3
1 3 6 NaN
2 5 10 12
3 6 11 15
4 NaN 12 14
5 6 NaN NaN
6 10 13 16
i. Do Nothing (Ignore Missing Data)
When to use?
• When only a few values are missing
(less than 5% of the dataset).
• If missing data is completely random.
Drawback:
• If missing data is significant, it reduces
the sample size and affects model
accuracy.
ii. Mean, Median, or Mode Imputation
• This method fills missing values
using the mean (average),
median (middle value), or mode
(most frequent value) of a
column.
ii. Mean, Median, or Mode Imputation
Example Dataset (Before Imputation)
S.No. Column 1 Column 2 Column 3
1 3 6 NaN
2 5 10 12
3 6 11 15
4 NaN 12 14
5 6 NaN NaN
6 10 13 16
Mean Imputation -> Replace NAN
• Column 1 Mean = (3+5+6++6+10) / 5 = 6
• Column 2 Mean = (6+10+11+12+13) / 5 = 8.66
• Column 3 Mean = (12+15+14+16) / 4 = 9.5
ii. Mean, Median, or Mode Imputation
Example Dataset (After Imputation)

S.No. Column 1 Column 2 Column 3


1 3 6 9.5
2 5 10 12
3 6 11 15
4 6 12 14
5 6 8.66 9.5
6 10 13 16
ii. Mean, Median, or Mode Imputation
Advantages:
• Fast and simple.
• Works well with numerical data.

Disadvantages:
• Does not preserve relationships
between variables.
• Not accurate if data has outliers.
• Does not work for categorical data.
iii. Imputation Using Most Frequent /
Zero / Constant Values
• This method is used for categorical
variables.
• The missing values are replaced
with:
§Most frequent value (Mode)
§Zero or constant values
iii. Imputation Using Most Frequent /
Zero / Constant Values
Example: Categorical Data
S.No. Gender City
1 M Hyderabad
2 F Bangalore
3 NAN Bangalore
4 F NaN
5 F Mumbai
iii. Imputation Using Most Frequent /
Zero / Constant Values
Example:
Using Most Frequent (Mode) → Replace NAN
Gender Mode = “Female"
City Mode = "Bangalore"
S.No. Gender City
1 M Hyderabad
2 F Bangalore
3 F Bangalore
4 F Bangalore
5 F Mumbai
iii. Imputation Using Most Frequent /
Zero / Constant Values
Advantages:
• Works well for categorical data.

Disadvantages:
• Creates bias in data.
• Does not preserve relationships
between variables.
i v. K - N e a r e s t N e i g h b o r s ( K N N )
Imputation
• KNN predicts missing values based
on the closest K neighbors in the
dataset.
How It Works?
• Finds the K closest neighbors
(similar rows).
• Uses the average of neighbors to
fill missing values.
i v. K - N e a r e s t N e i g h b o r s ( K N N )
Imputation
Example:
S.No. Age Salary Experience
1 25 50,000 3
2 28 60,000 5
3 30 NaN 6
4 35 80,000 10
5 40 90,000 NaN
i v. K - N e a r e s t N e i g h b o r s ( K N N )
Imputation
Step 1: Find the 3 Nearest Neighbors for S.No. 3
(missing salary)
• Closest to S.No. 2 & S.No. 4
• Take their average: (60,000 + 80,000) / 2 =
70,000
• Replace NAN with 70,000
Step 2: Find the 3 Nearest Neighbors for S.No. 5
(missing experience)
• Closest to S.No. 3 & S.No. 4
• Take their average: (6 + 10) / 2 = 8
• Replace NAN with 8
i v. K - N e a r e s t N e i g h b o r s ( K N N )
Imputation
Example:
S.No. Age Salary Experience
1 25 50,000 3
2 28 60,000 5
3 30 70,000 6
4 35 80,000 10
5 40 90,000 8
i v. K - N e a r e s t N e i g h b o r s ( K N N )
Imputation
Advantages:
• More accurate than mean/median
imputation.
• Preserves relationships between variables.

Disadvantages:
• Sensitive to outliers.
• Computationally expensive for large
datasets.
Which Imputation Method Should
You Use?
Which Imputation Method Should
You Use?
•F o r n u m e r i c a l d a t a : U s e
Mean/Median if speed is important.
Use KNN if accuracy is needed.
•For small missing data: Ignoring
missing values may be fine.
•For categorical data: Use Mode
(most frequent value).
Need for Business Modeling
• Business modeling is the process of
representing the structure,
o p e ra t i o n s , a n d p o l i c i e s o f a
business in a systematic way.
• It provides a clear blueprint for
understanding how a business
creates, delivers, and captures value.
Need for Business Modeling
It includes:
i. Business Goals & Objectives – What
the business wants to achieve.
ii.P r o c e s s e s & W o r k f l o w s – H o w
different tasks are carried out.
iii.Data & Information Flow – How data
is stored, shared, and utilized.
iv.S t a ke h o l d e r s & R o l e s – W h o i s
involved in different processes.
Why is Business Modeling Important?
Business modeling is crucial for organizations
because it:
i. Helps in Decision Making
• Business models provide data-driven insights,
allowing managers to make strategic decisions
about investments, expansion, and resource
allocation.
Example: An e-commerce company uses a
business model to decide whether to expand
into international markets by analyzing customer
demand, costs, and potential revenue.
Why is Business Modeling Important?
ii. Improves Operational Efficiency
• By mapping workflows and processes,
businesses can identify inefficiencies,
bottlenecks, and redundant steps.
Example: A manufacturing company may
use business process modeling to find
ways to reduce production costs and
improve delivery times.
Why is Business Modeling Important?
iii. Aligns Business Goals with IT Systems
• B u s i n e s s m o d e l s b r i d ge t h e ga p
b e t w e e n b u s i n e s s s t ra te g y a n d
technology implementation.
Example: A bank uses a business model
to determine how a new AI-driven loan
approval system aligns with its goal of
reducing loan processing time.
Why is Business Modeling Important?
iv. Enhances Risk Management
• Modeling helps businesses predict
potential risks and develop
contingency plans.
Example: A supply chain model helps a
retail company prepare for delays in
product delivery by identifying
alternative suppliers.
Why is Business Modeling Important?
v. Business Modeling in Data-Driven
Decision Making
• Modern business modeling often
integrates data analytics and AI to
enhance decision-making.
Example: A retail store analyzes
customer purchasing behavior and
adjusts its business model to introduce a
personalized recommendation system.
Why is Business Modeling Important?
vi. Supports Innovation & Growth
• Business modeling helps companies
identify new opportunities, test new
ideas, and adapt to market changes.

Example: A startup might develop a


subscription-based business model to
generate consistent revenue instead of
relying on one-time sales.
Why is Business Modeling Important?
vii. Attracts Investors & Stakeholders
• A well-defined business model shows
investors how the company plans to
make a profit, increasing trust and
investment potential.
Example: A SaaS (Software-as-a-Service)
company presents a business model that
explains how it will acquire customers,
generate recurring revenue, and scale
operations.

You might also like