Data Analytics
Unit-2
Data Analytics: Introduction to Analytics,
Introduction to Tools and Environment,
Application of Modeling in Business,
Databases & Types of Data and Variables,
D a ta M o d e l i n g Te c h n i q u e s , M i s s i n g
I m p u tat i o n s etc . , N e e d fo r B u s i n e s s
Modeling.
    Introduction to Analytics
• In today's data-driven world, enormous
  amounts of data are generated daily from
  various sources, such as social media,
  business transactions, and online activities.
• Extracting meaningful insights from this data
  has become essential for individuals and
  organizations to make informed decisions.
• Data Analytics plays a vital role in
  identifying patterns, improving operations,
  and driving success.
   Introduction to Analytics
• 4 main factors which signify the
  need for Data Analytics are:
 i.     Gather Hidden Insights
 ii.    Generate Reports
 iii.   Perform Market Analysis
 iv.    Improve Business Requirement
   Introduction to Analytics
i. Gather Hidden Insights:
• Data often holds valuable information that
  is not immediately visible.
• By analyzing data, we can uncover patterns
  and insights that help solve problems or
  identify opportunities and make strategic
  d e c i s i o n s .
  Example: A streaming platform like Netflix
  analyzes user viewing patterns to
  recommend shows or movies.
     Introduction to Analytics
ii. Generate Reports:
• Re p o r t s p re s e nt a n a l yze d d ata i n a
   structured manner, helping organizations
   and teams make better decisions.
Example: Schools can generate reports from
student performance data to identify
subjects where students need additional
support.
   Introduction to Analytics
iii. Perform Market Analysis:
• Anal yzi ng mar ket t re n d s h e l ps
   organizations understand customer
   preferences and stay competitive.
Example: A smartphone company
analyzes market trends to decide
which features to prioritize in its next
release.
    Introduction to Analytics
i v. I m p r o v e R e q u i r e m e n t s a n d
Experience:
• U n d e rsta n d i n g c u sto m e r o r u s e r
   behavior through data analytics allows
   for better ser vices and improved
   experiences.
Example: E-commerce platforms analyze
customer purchase patterns to suggest
personalized product recommendations.
    Introduction to Analytics
i v. I m p r o v e R e q u i r e m e n t s a n d
Experience:
• U n d e rsta n d i n g c u sto m e r o r u s e r
   behavior through data analytics allows
   for better ser vices and improved
   experiences.
Example: E-commerce platforms analyze
customer purchase patterns to suggest
personalized product recommendations.
    Introduction to Analytics
Data Analytics:
• It involves techniques for analyzing data
  to enhance productivity and achieve
  business gains.
• Data is extracted from various sources,
  cleaned, categorized, and analyzed to
  uncover behavioral patterns and trends.
• The techniques and tools vary based on
  organizational needs.
    Introduction to Analytics
Data Analytics:
• It involves techniques for analyzing data
  to enhance productivity and achieve
  business gains.
• Data is extracted from various sources,
  cleaned, categorized, and analyzed to
  uncover behavioral patterns and trends.
• The techniques and tools vary based on
  organizational needs.
     Introduction to Analytics
Common Techniques:
i. Data Mining: Extracting patterns from large datasets.
ii. Statistical Analysis: Applying mathematical models to
    analyze data.
iii.Predictive Analytics: Using historical data to predict
    future trends.
iv.Machine Learning: Automating data analysis to
    discover insights.
Example: A bank may use machine learning models to
predict customer churn and identify clients who are
likely to leave the bank. By offering targeted promotions,
the bank can retain these customers.
Role of Data Analysts
• Data Analysts play a crucial role in transforming data
   into valuable insights. They collect, process, and
   analyze data, then present their findings in reports or
   dashboards that help decision-makers.
Example Workflow:
• Collect Data: Gather information from various
   sources, such as databases or surveys.
• Clean Data: Remove duplicates and errors to ensure
   data accuracy.
• Analyze Data: Use tools and techniques to find
   patterns.
• Generate Reports: Present insights through charts,
   tables, and written summaries.
         Fig. Data Analytics
https://www.wallstreetmojo.com/data-analytics/#what-is-data-analytics
Types of Analytics and Human
Knowledge Involvement
i. Descriptive Analytics
ii. Diagnostic Analytics
iii. Predictive Analytics
iv. Prescriptive Analytics
v. Cognitive Analytics
          Fig. Data and Human Knowledge Involvement
https://www.sv-europe.com/blog/10-reasons-organisation-ready-prescriptive-analytics/
     Types of Analytics and Human Knowledge
                   Involvement
i. Descriptive Analytics: Provides an understanding of past
    data and helps answer "what happened?"
    Example: Monthly sales reports showing revenue trends.
     • Human Input: High human interpretation is required
        to summarize the data and understand its context.
ii. Diagnostic Analytics: Examines data to determine the
causes of events and answer "Why did it happen?"
Example: Identifying why sales dropped by analyzing
customer feedback, marketing campaigns, and competitor
actions.
      • Human Input: Moderate, as analysts must interpret
        correlations and identify root causes.
     Types of Analytics and Human Knowledge
                   Involvement
iii. Predictive Analytics: Predicts future outcomes based on
h i s t o r i c a l                        d a t a .
Example: Forecasting demand for seasonal products.
     • Human Input: Less human intervention is needed;
        algorithms handle most of the prediction tasks.
iv. Prescriptive Analytics: Provides recommendations for
optimal decision-making.
Example: A logistics company may use prescriptive analytics
to determine the most efficient delivery routes.
• Human Input: Minimal or no human input is required, as
     automated systems handle decision-making.
     Types of Analytics and Human Knowledge
                   Involvement
V. Cognitive Analytics: Mimics human thought processes to
analyze data and provide insights. It combines artificial
intelligence, machine learning, and natural language
processing.
Example: A virtual assistant like Siri or Alexa analyzing user
requests and providing relevant information.
    • Human Input: Very minimal, as cognitive systems
       operate autonomously and learn from data over
       time.
    Introduction to Analytics
• Data Analytics has become essential for
  businesses to stay competitive and thrive in
  the data-driven world.
• Understanding the different types of
  analytics and how they require varying
  levels of human knowledge can help
  organizations make better decisions and
  achieve operational excellence.
Introduction to Analytics
  https://uwex.wisconsin.edu/stories-news/data-science-vs-data-analytics/
 Introduction to Tools and Environment
Data Analytics typically involves three main
components:
 i. Subject Knowledge: Understanding the business
     or field where the analysis is being applied (e.g.,
     healthcare, marketing, or education).
 ii. Statistical Knowledge: Applying mathematical
     techniques to analyze data and draw meaningful
     conclusions.
 iii.Te c h n i c a l K n o w l e d g e : U s i n g t o o l s a n d
     programming languages to clean, analyze, and
     visualize data effectively.
          https://www.wallstreetmojo.com/data-analytics/#what-is-data-analytics
Introduction to Tools and Environment
     https://www.wallstreetmojo.com/data-analytics/#what-is-data-analytics
 Introduction to Tools and Environment
• A Data Analyst must be proficient in all
  three areas to generate valuable
  insights for businesses.
• With the increasing demand for Data
  Analytics in the market, many tools have
  emerged with various functionalities for
  this purpose.
 Introduction to Tools and Environment
• Either open-source or user-friendly, some of the
  popular tools and environments used in Data
  Analytics are:
      i. R Programming
      ii. Python
      iii. Tableau Public
      iv. QlikView
      v. SAS
      vi. Microsoft Excel
      vii. RapidMiner
      viii.KNIME (Konstanz Information Miner)
      ix. OpenRefine
      x. Apache Spark
 Introduction to Tools and Environment
i. R Programming
R is a leading tool for statistical computing and data
modeling. It is highly flexible and supports data
visualization, machine learning, and reporting.
Example: A researcher uses R to analyze survey
data and visualize results in bar charts and
heatmaps.
Key Features:
   • Compatible with Windows, Mac, and UNIX
       systems.
   • Allows automatic installation of user-required
       packages like "ggplot2" for visualizations.
 Introduction to Tools and Environment
ii. Python
P y t h o n i s a n o p e n - s o u rc e , o b j e c t - o r i e nte d
programming language that is easy to read, write,
and maintain. It is one of the most popular tools for
data analytics and machine learning.
Example: A data analyst uses Python's Pandas
library to clean messy e-commerce sales data and
uses Matplotlib to create sales trend charts.
Key Libraries:
   • Scikit-learn for machine learning.
   • TensorFlow and Keras for deep learning.
   • Matplotlib and Seaborn for data visualization.
 Introduction to Tools and Environment
iii. Tableau Public
Tableau Public is a free data visualization tool that
connects to various data sources and allows users
to create interactive dashboards.
Example: A marketing team uses Tableau Public to
create a dashboard showing customer purchase
trends and sales performance over time.
Key Features:
   • Real-time data updates.
   • Ability to publish dashboards on the web for
      easy sharing.
 Introduction to Tools and Environment
iv. QlikView
QlikView provides fast, in-memory data
processing and visualization capabilities.
Example: A retail company uses QlikView to
quickly analyze sales data and identify which
products are performing best.
Key Features:
   • Data compression for faster processing.
   • Dynamic visualizations.
 Introduction to Tools and Environment
v. SAS
SAS is a powerful programming language and
environment for data manipulation and
analytics.
Example: A finance analyst uses SAS to
forecast market trends based on historical
stock price data.
Key Features:
   • Access to data from multiple sources.
   • Comprehensive statistical tools.
 Introduction to Tools and Environment
vi. Microsoft Excel
Excel is one of the most widely used data
analytics tools, particularly for smaller
datasets.
Example: A small business owner uses Excel
pivot tables to summarize sales data and
identify the best-performing products.
Key Features:
   • Easy data summarization with pivot
      tables.
   • Basic data visualization capabilities.
 Introduction to Tools and Environment
vii. RapidMiner
RapidMiner is a comprehensive platform for
predictive analytics, machine learning, and
text analytics.
E xa m p l e : A s o c i a l m e d i a a n a l y st u s e s
RapidMiner to analyze user sentiment from
tweets and predict trending topics.
Key Features:
   • Integration with various data sources
      like Excel and SQL databases.
 Introduction to Tools and Environment
viii. KNIME (Konstanz Information Miner)
KNIME is an open-source data analytics
platform with visual programming.
E xa m p l e : A re s e a rc h e r u s e s K N I M E to
preprocess large datasets and create a
machine-learning model for predicting
disease outcomes.
Key Features:
   • Drag-and-drop functionality for data
       workflows.
   • Easy integration with other tools.
 Introduction to Tools and Environment
ix. OpenRefine
OpenRefine (previously Google Refine) is a
data cleaning tool used for transforming
messy data.
Example: A data analyst uses OpenRefine to
clean inconsistent product names in an e-
commerce dataset.
Key Features:
   • Data transformation.
   • Parsing data from websites.
 Introduction to Tools and Environment
x. Apache Spark
Apache Spark is a large-scale data processing
engine, often used in big data applications.
Example: A data engineer uses Apache Spark
to process and analyze massive amounts of
streaming data from social media in real time.
Key Features:
   • Faster processing in memory (100 times
     faster than Hadoop).
   • Machine learning model development.
 Introduction to Tools and Environment
Skills for Data Analysts
Apart from knowing the tools, a Data Analyst
should also develop the following skills:
  i. Statistics
  ii. Data Cleaning
  iii. Exploratory Data Analysis (EDA)
  iv. Data Visualization
  v. Machine Learning Knowledge (Optional)
Skills for Data Analysts
i. Statistics: Understanding and applying statistical
techniques to draw meaningful insights.
    Example: Calculating the average customer
    spending from sales data.
ii. Data Cleaning: Ensuring data is accurate and
consistent before analysis.
    Example: Removing duplicate records from a
    customer database.
iii. Exploratory Data Analysis (EDA): Understanding
patterns, trends, and relationships within the data.
    Example: Identifying seasonal trends in sales
    data.
Skills for Data Analysts
iv. Data Visualization: Presenting data in an
easily understandable format.
   Example: Creating bar charts and line
   graphs to represent sales trends.
v. Machine Learning Knowledge (Optional):
Enhancing data analytics capabilities by
building predictive models.
   Example: Predicting customer churn using
   machine learning models.
 Introduction to Tools and Environment
• Understanding these tools and developing
  essential skills will empower students to
  become effective data analysts.
• By mastering data cleaning, visualization,
  and statistical techniques, analysts can turn
  raw data into meaningful insights and
  contribute to decision-making in various
  fields.
 Application of Modeling in Business
• Business modeling helps
  organizations understand how they
  operate, create value, and make
  decisions efficiently.
• With proper models in place,
  businesses can predict trends,
  o pt i m i ze re s o u rc e s , a n d ga i n
  competitive advantages.
Application of Modeling in Business
Why Business Modeling is Important:
i. Strategic Decision-Making: Helps
    executives make informed decisions
    by analyzing future risks and
    opportunities.
ii. R e s o u r c e A l l o c a t i o n : E n s u r e s
    efficient use of financial, human, and
    technological resources.
Application of Modeling in Business
Why Business Modeling is Important:
iii.Performance Measurement: Evaluates
b u s i n e s s p e r fo r m a n c e t o i m p ro v e
efficiency.
iv. Innovation: Encourages exploration of
new business ideas and models.
v. Market Forecasting: Helps predict
market trends and customer behavior.
Application of Modeling in Business
Why Business Modeling is Important:
Example: A retail company might use
sales forecasting models to predict
demand during festive seasons, which
ensures the right inventory is stocked,
avoiding overproduction or shortages.
Big Data Analytics in Business
• Big data analytics involves processing vast
  amounts of structured and unstructured
  data to uncover trends, patterns, and
  valuable business insights.
Key Applications:
  i. Customer Behavior Analysis
  ii. Risk Management
  iii. Marketing Campaigns
  iv. Supply Chain Optimization
Big Data Analytics in Business
Key Applications:
i. Customer Behavior Analysis: Understand
     customer preferences to offer personalized
     products.
ii. Risk Management: Identify fraudulent
     transactions and financial risks.
iii. Marketing Campaigns: Analyze social media
     and website data to optimize campaigns.
iv. S u p p l y C h a i n O p t i m i z a t i o n : E n s u r e
     smoother and more cost-effective logistics
     operations.
Big Data Analytics in Business
Example:
• Social Media Analytics: Companies like
  Facebook and Twitter analyze user
  interactions to assess the impact of
  advertising campaigns and identify customer
  sentiment toward products.
• E-Commerce Giants (Amazon and eBay):
  They examine customer purchasing patterns,
  analyze browsing behavior, and predict
  factors that influence user interactions and
  sales revenue.
Big Data Analytics Framework Using Hadoop
Hadoop Framework: Hadoop is an open-
s o u r c e p l a t fo r m u s e d fo r d i s t r i b u t e d
processing of large data sets.
• It has three main steps:
i. Map() Step: Splits input data into smaller
   parts and assigns tasks to worker nodes to
   generate key-value outputs.
     Example: Breaking down a customer
     review dataset into individual review
     sentences.
Big Data Analytics Framework Using Hadoop
ii. Shuffle() Step: Combines similar key-value
pairs from different worker nodes for further
processing.
   Example: Collecting all mentions of a
  product name together from different nodes.
iii. Reduce() Step: Processes the grouped data
  to produce the final results.
   Example: Counting how many times a
  specific product feature is mentioned in
  reviews.
Big Data Analytics Framework Using Hadoop
Classic Example (Word Count Problem):
• Imagine counting the number of
  occurrences of each word in a large
  document.
• The MapReduce algorithm splits the
  document, counts individual words (map),
  groups similar words (shuffle), and finally
  aggregates the counts for each word
  (reduce).
Big Data Analytics Framework Using Hadoop
MapReduce Example: Word Count Problem
Data-Driven Companies Using Big Data Analytics
 • IBM and Microsoft: Both companies offer
   cloud-based big data solutions. IBM provides
   analytics in business intelligence and
   healthcare.
 • So c ia l M e d i a P l at fo r m s ( Fa c e b o o k a n d
   Twitter): Analyze user profiles and interactions
   to target ads and increase their revenue.
 • Healthcare Example: A hospital using big data
   analytics can predict patient admission rates,
   optimize staffing, and improve patient care
   outcomes.
 Three V's of Big Data:
• Volume: Refers to the large size of data (e.g.,
  millions of customer transactions).
• Variety: Different types of data (text, images,
  videos).
• Velocity: Speed at which data is generated
  and processed (real-time social media
  analytics).
Example: Netflix uses these three V's to
analyze user behavior, predict preferences,
and suggest personalized content in real time.
                Databases
• A Database i s an o rgani zed co l l ec ti o n o f
  structured information or data stored
  electronically on a computer.
• It is managed by a Database Management
  System (DBMS) that allows users to access,
  manipulate, and manage the data.
Categories of Databases:
  i. Text Databases
  ii. Desktop Databases
  iii.Relational Databases (RDB)
  iv.NoSQL Databases
  v. Object-Oriented Databases (OODB)
                  Databases
i. Text Databases:
    • Stores large collections of text.
    Examples: Textbooks, magazines, journals,
    and manuals.
ii. Desktop Databases:
    • Designed for use on a single PC.
    • T h ey a re s i m p l e r a n d h ave l i m i te d
      functionality compared to large-scale
      database systems.
    Examples: Microsoft Excel, Microsoft
    Access, etc
               Databases
iii. Relational Databases (RDB):
    • Store data in tables, where each table has
      rows and columns.
    • Tables can share information, making data
      searchable and organized.
    Examples: SQL, Oracle, Db2, DbaaS.
iv. NoSQL Databases:
   • Non-tabular and stores data differently
     than relational tables.
                  Databases
  • Types: Document, key-value, wide-
    column, and graph databases.
  Examples: MongoDB, CouchDB, JSON.
v. Object-Oriented Databases (OODB):
   • S to re d ata a s o b j e c t s a n d c l a s s e s
      following object-oriented programming
      principles.
   Examples: Java, C++, Smalltalk, LISP.
  Types of Data and Variables
•When we work with data in
 databases, we need to understand
 the types of data and variables.
•In relational databases, rows
 represent data (records) and
 c o l u m n s re p re s e nt att r i b u te s
 (characteristics).
Types of Data and Variables
Big Data Representation
• In big data, columns from RDBMS
  are referred to as attributes or
  variables.
• The variable can be categorized into
  two types:
    i. Categorical (Qualitative) Data
    ii. Quantitative Data (Discrete or
        Continuous)
Big Data Representation
i. Categorical (Qualitative) Data: Data
represented by characters or labels,
rather than numbers.
Types of Categorical Data:
     a. Nominal Data
     b. Ordinal Data
Big Data Representation
a . N o m i n a l D ata : N o n at u ra l o rd e r o r
s     e     q      u     e    n     c     e     .
Examples: Color, Gender, Names of animals.
E.g.: Arranging the gender of 50 students has
no specific order.
b. Ordinal Data: Has a natural order or
s     e     q      u     e    n     c     e     .
Examples: Clothing sizes (S, M , L , XL),
Customer Ratings (Excellent, Good, Bad).
E.g.: Clothing sizes follow a clear increasing
order, giving valuable insights.
Big Data Representation
ii. Quantitative Data (Discrete or
Continuous): Data that can be
measured and represented
numerically.
Types of Quantitative Data:
     a. Discrete Data
     b. Continuous Data
Big Data Representation
a. Discrete Data: Finite countable values
( w h o l e                n u m b e r s ) .
Examples: Number of buttons, Delivery days
for a product.
E.g.: Number of customer orders recorded
each week.
b. Continuous Data: Infinite values within a
ra n ge ( i n c l u d i n g f ra c t i o n a l n u m b e rs ) .
Examples:Price, Height, Weight, Temperature.
E.g.: Tracking changes in product prices over
time.
Types of Data and Variables
• By understanding the types of databases
  and data categories, you can effectively
  store and analyze information in various
  systems.
• Focus on recognizing whether your data
  belongs to a categorical or quantitative
  group and choose appropriate analysis
  techniques accordingly.
Data Modeling Techniques
• Data modeling is the process of
  structuring and organizing data in a
  database so that it can be stored
  efficiently and used for analysis.
• It helps businesses and organizations
  m a ke d ata - d r i ve n d e c i s i o n s b y
  providing a well-defined structure for
  data storage and retrieval.
        https://www.klipfolio.com/blog/6-data-modeling-techniques
Data Modeling Techniques
For example: a university database
stores information about students,
courses, and professors and marks
need to be stored systematically.
• A good data model ensures that
  a l l t h i s i n fo r m at i o n i s w e l l -
  organized and easy to retrieve
  when needed.
Types of Data Models
• Data modeling can be achieved in various
  ways.
• However, the basic concept of each of them
  remains the same.
i. Hierarchical model
• Data is structured in a tree-like
  hierarchy, where each parent node
  can have multiple child nodes, but
  each child has only one parent.
i. Hierarchical model
Example 1: Library Management
System
i. Hierarchical model
• Each book belongs to only one
  category (parent).
• Retrieving a book’s details requires
  searching through the hierarchy.
Limitations:
• Data retrieval is difficult.
• If a book needs to be moved from
  one category to another category,
  it is complex to update.
i. Hierarchical model
Example 2:
     Fig 2. Hierarchical Model Structure
ii. Relational Data Modeling Technique
• Data is organized into tables
  (relations), with rows (records)
  and columns (attributes).
Example: Online Shopping
Database (Amazon, Flipkart, etc.)
ii. Relational Data Modeling Technique
                                Fig. Relational Model
                                Structure
•Customer_ID, Product_ID, Vendor_ID are the a
 foreign key ’s in the Customes, Products and
 Vendors Tables, linking it to the Sales Tables.
ii. Relational Data Modeling Technique
Advantages:
• Reduces data duplication
• Easy to update and retrieve data
• Simple to understand
  relationships
iii. Network Data Modeling Technique
• Data is represented using nodes
  (entities) and edges
  (relationships).
• Unlike hierarchical models, an
  entity can have multiple parents.
Example 1: Hospital Management
System
iii. Network Data Modeling Technique
• A hospital database where
  patients can visit multiple
  doctors, and doctors can have
  multiple patients.
iii. Network Data Modeling Technique
 Example 2:
  Fig. Network
  Model
  Structure
iii. Network Data Modeling Technique
 Example 3:
        Fig. Network Model Structure
iii. Network Data Modeling Technique
Advantages:
• More flexible than hierarchical
  models
• Best for complex relationships
iv. Entity-Relationship (ER) Data
Modeling Technique
• Uses diagrams to represent entities,
  attributes, and relationships in a
  database.
Example: College Student Database
Example Scenario: A college stores
information about students and courses
iv. Entity-Relationship (ER) Data
Modeling Technique
“ER Diagram” for college student database:
iv. Entity-Relationship (ER) Data
Modeling Technique
Advantages:
• Visual representation makes it
  easy to understand.
• Helps design database structure
  efficiently.
v. Object-Oriented Data Modeling
Technique
• Data is stored as objects, similar
  to how data is managed in
  object-oriented programming.
Example: Social Media Application
(Instagram, Facebook, etc.).
v. Object-Oriented Data Modeling
Technique
v. Object-Oriented Data Modeling
Technique
Advantages:
•B e s t f o r r e a l - w o r l d
 applications.
•Handles complex data
 structures easily.
Types of Data Models- Example for “shopping
                database”
Importance of Data Modeling
• Avoids data redundancy (no
  unnecessary data repetition)
• Ensures data consistency (all data is
  accurate and reliable)
• Makes data retrieval efficient
  (faster searches and queries)
• Helps in business decision-making
  (better data insights)
Best Practices for Data Modeling
• Understand Business Needs- Before
  designing a data model, understand the
  business goals and data requirements.
• Keep Models Simple- Start with a small
  model and scale as needed.
• Organize Data Using Facts &
  Dimensions- Use separate tables for
  Facts (e.g., sales data) and Dimensions
  (e.g., product categories, customer
  location) for better insights.
Best Practices for Data Modeling
• Avoid Storing Unnecessary Data-
  Storing too much data affects
  p e r fo r m a n c e . O n l y ke e p w h a t i s
  required.
• Continuously Validate & Update- Data
  models must be regularly updated as
  business requirements evolve.
     Missing Imputations
• Missing data can be a major
  problem in data analysis, as it may
  lead to incorrect conclusions.
• Imputation is the process of
  replacing missing values with
  substituted values to ensure a
  complete dataset.
Imputation Methods
Different imputation methods are:
i. Do Nothing (Ignore Missing Data)
ii. M e a n , M e d i a n , o r M o d e
    Imputation
iii.Imputation Using Most Frequent /
    Zero / Constant Values
iv.K - N e a r e s t N e i g h b o r s ( K N N )
    Imputation
i. Do Nothing (Ignore Missing Data)
•In this approach, missing data is
 left as it is.
•This is often used when missing
 values are minimal, or when the
 dataset is large enough that
 missing values do not
 significantly impact results.
i. Do Nothing (Ignore Missing Data)
Example: If a dataset has only 2% missing
values, ignoring them may not affect the
analysis.
     S.No.   Column 1   Column 2   Column 3
     1       3          6          NaN
     2       5          10         12
     3       6          11         15
     4       NaN        12         14
     5       6          NaN        NaN
     6       10         13         16
i. Do Nothing (Ignore Missing Data)
Example: If a dataset has only 2% missing
values, ignoring them may not affect the
analysis.
     S.No.   Column 1   Column 2   Column 3
     1       3          6          NaN
     2       5          10         12
     3       6          11         15
     4       NaN        12         14
     5       6          NaN        NaN
     6       10         13         16
i. Do Nothing (Ignore Missing Data)
 When to use?
• When only a few values are missing
  (less than 5% of the dataset).
• If missing data is completely random.
Drawback:
• If missing data is significant, it reduces
  the sample size and affects model
  accuracy.
ii. Mean, Median, or Mode Imputation
• This method fills missing values
  using the mean (average),
  median (middle value), or mode
  (most frequent value) of a
  column.
ii. Mean, Median, or Mode Imputation
Example Dataset (Before Imputation)
      S.No.   Column 1   Column 2   Column 3
      1       3          6          NaN
      2       5          10         12
      3       6          11         15
      4       NaN        12         14
      5       6          NaN        NaN
      6       10         13         16
Mean Imputation -> Replace NAN
• Column 1 Mean = (3+5+6++6+10) / 5 = 6
• Column 2 Mean = (6+10+11+12+13) / 5 = 8.66
• Column 3 Mean = (12+15+14+16) / 4 = 9.5
ii. Mean, Median, or Mode Imputation
Example Dataset (After Imputation)
      S.No.   Column 1   Column 2   Column 3
      1       3          6          9.5
      2       5          10         12
      3       6          11         15
      4       6          12         14
      5       6          8.66       9.5
      6       10         13         16
ii. Mean, Median, or Mode Imputation
Advantages:
• Fast and simple.
• Works well with numerical data.
Disadvantages:
• Does not preserve relationships
  between variables.
• Not accurate if data has outliers.
• Does not work for categorical data.
iii. Imputation Using Most Frequent /
Zero / Constant Values
• This method is used for categorical
  variables.
• The missing values are replaced
  with:
   §Most frequent value (Mode)
   §Zero or constant values
iii. Imputation Using Most Frequent /
Zero / Constant Values
Example: Categorical Data
      S.No.   Gender   City
      1       M        Hyderabad
      2       F        Bangalore
      3       NAN      Bangalore
      4       F        NaN
      5       F        Mumbai
iii. Imputation Using Most Frequent /
Zero / Constant Values
Example:
Using Most Frequent (Mode) → Replace NAN
Gender Mode = “Female"
City Mode = "Bangalore"
        S.No.   Gender   City
        1       M        Hyderabad
        2       F        Bangalore
        3       F        Bangalore
        4       F        Bangalore
        5       F        Mumbai
iii. Imputation Using Most Frequent /
Zero / Constant Values
 Advantages:
 • Works well for categorical data.
Disadvantages:
• Creates bias in data.
• Does not preserve relationships
  between variables.
i v. K - N e a r e s t N e i g h b o r s ( K N N )
Imputation
 • KNN predicts missing values based
   on the closest K neighbors in the
   dataset.
 How It Works?
 • Finds the K closest neighbors
   (similar rows).
 • Uses the average of neighbors to
   fill missing values.
i v. K - N e a r e s t N e i g h b o r s ( K N N )
Imputation
Example:
    S.No.   Age      Salary       Experience
    1       25       50,000       3
    2       28       60,000       5
    3       30       NaN          6
    4       35       80,000       10
    5       40       90,000       NaN
i v. K - N e a r e s t N e i g h b o r s ( K N N )
Imputation
Step 1: Find the 3 Nearest Neighbors for S.No. 3
(missing salary)
• Closest to S.No. 2 & S.No. 4
• Take their average: (60,000 + 80,000) / 2 =
   70,000
• Replace NAN with 70,000
Step 2: Find the 3 Nearest Neighbors for S.No. 5
(missing experience)
• Closest to S.No. 3 & S.No. 4
• Take their average: (6 + 10) / 2 = 8
• Replace NAN with 8
i v. K - N e a r e s t N e i g h b o r s ( K N N )
Imputation
Example:
    S.No.   Age      Salary       Experience
    1       25       50,000       3
    2       28       60,000       5
    3       30       70,000       6
    4       35       80,000       10
    5       40       90,000       8
i v. K - N e a r e s t N e i g h b o r s ( K N N )
Imputation
Advantages:
• More accurate than mean/median
  imputation.
• Preserves relationships between variables.
Disadvantages:
• Sensitive to outliers.
• Computationally expensive for large
  datasets.
Which Imputation Method Should
You Use?
Which Imputation Method Should
You Use?
•F o r n u m e r i c a l d a t a : U s e
Mean/Median if speed is important.
Use KNN if accuracy is needed.
•For small missing data: Ignoring
missing values may be fine.
•For categorical data: Use Mode
(most frequent value).
    Need for Business Modeling
• Business modeling is the process of
  representing the structure,
  o p e ra t i o n s , a n d p o l i c i e s o f a
  business in a systematic way.
• It provides a clear blueprint for
  understanding how a business
  creates, delivers, and captures value.
    Need for Business Modeling
It includes:
i. Business Goals & Objectives – What
   the business wants to achieve.
ii.P r o c e s s e s & W o r k f l o w s – H o w
   different tasks are carried out.
iii.Data & Information Flow – How data
   is stored, shared, and utilized.
iv.S t a ke h o l d e r s & R o l e s – W h o i s
   involved in different processes.
Why is Business Modeling Important?
Business modeling is crucial for organizations
because it:
i. Helps in Decision Making
• Business models provide data-driven insights,
  allowing managers to make strategic decisions
  about investments, expansion, and resource
  allocation.
Example: An e-commerce company uses a
business model to decide whether to expand
into international markets by analyzing customer
demand, costs, and potential revenue.
Why is Business Modeling Important?
ii. Improves Operational Efficiency
• By mapping workflows and processes,
    businesses can identify inefficiencies,
    bottlenecks, and redundant steps.
Example: A manufacturing company may
use business process modeling to find
ways to reduce production costs and
improve delivery times.
Why is Business Modeling Important?
iii. Aligns Business Goals with IT Systems
• B u s i n e s s m o d e l s b r i d ge t h e ga p
     b e t w e e n b u s i n e s s s t ra te g y a n d
     technology implementation.
Example: A bank uses a business model
to determine how a new AI-driven loan
approval system aligns with its goal of
reducing loan processing time.
Why is Business Modeling Important?
iv. Enhances Risk Management
• Modeling helps businesses predict
    potential risks and develop
    contingency plans.
Example: A supply chain model helps a
retail company prepare for delays in
product delivery by identifying
alternative suppliers.
Why is Business Modeling Important?
v. Business Modeling in Data-Driven
Decision Making
• Modern business modeling often
  integrates data analytics and AI to
  enhance decision-making.
Example: A retail store analyzes
customer purchasing behavior and
adjusts its business model to introduce a
personalized recommendation system.
Why is Business Modeling Important?
vi. Supports Innovation & Growth
• Business modeling helps companies
  identify new opportunities, test new
  ideas, and adapt to market changes.
Example: A startup might develop a
 subscription-based business model to
 generate consistent revenue instead of
 relying on one-time sales.
Why is Business Modeling Important?
vii. Attracts Investors & Stakeholders
• A well-defined business model shows
  investors how the company plans to
  make a profit, increasing trust and
  investment potential.
Example: A SaaS (Software-as-a-Service)
company presents a business model that
explains how it will acquire customers,
generate recurring revenue, and scale
operations.