SC-BA - 02 : DATA MINING
(2019 Pattern) (Semester - II) (206 BA)
 Time : 2½ Hours]                                                  [Max. Marks : 50
 Instructions to the candidates:
      1) All questions are compulsory.
      2) Figure to the right indicate marks for questions/sub questions.
 Q1) Solve Any Five :
       a) What is Data Mining?
       b) What is Data Preprocessing?
       c) What is Association Analysis? Give an Example.
       d) What is Clustering? List the methods of clustering.
       e) What is Classification? Name any two Algorithms used for it.
       f)   What is big data Analysis?
       g) What is ratio data? Write any two characteristics of ratio data.
       h) What is the role of Business intelligence in decision making?
a) What is Data Mining?
Data mining is the process of extracting knowledge from large amounts of data. It
is a process of discovering patterns and trends in data that would otherwise be
hidden. Data mining can be used to make predictions, identify relationships, and
find anomalies.
b) What is Data Preprocessing?
Data preprocessing is the process of preparing data for data mining. It involves
cleaning, transforming, and formatting the data so that it is in a format that can be
analyzed by data mining algorithms. Data preprocessing is an important step in
data mining because it can improve the accuracy and efficiency of the data mining
process.
c) What is Association Analysis? Give an Example.
Association analysis is a type of data mining that discovers relationships between
different items in a data set. It is often used to find patterns in customer behavior,
such as what products are often purchased together. For example, an association
rule might be that "people who buy milk also tend to buy bread."
d) What is Clustering? List the methods of clustering.
Clustering is a type of data mining that groups similar data points together. It is
often used to find natural groupings in data, such as customer segments or product
categories. There are many different clustering algorithms, but some of the most
common include:
   • K-means clustering: This algorithm divides the data into k clusters, where k
       is a user-defined number.
   • Hierarchical clustering: This algorithm builds a hierarchy of clusters,
       starting with individual data points and merging them together until there is
       only one cluster left.
   • Density-based clustering: This algorithm finds clusters of data points that
       are densely packed together.
e) What is Classification? Name any two Algorithms used for it.
Classification is a type of data mining that assigns labels to data points. It is often
used to classify customers, products, or other entities. Some of the most common
classification algorithms include:
   • Decision trees: These algorithms build a tree-like structure that represents
       the decision rules for classifying data points.
   • Support vector machines: These algorithms find the hyperplanes that best
       separate different classes of data points.
f) What is big data Analysis?
Big data analysis is the process of extracting knowledge from large and complex
data sets. It is a rapidly growing field, as the amount of data that is being generated
is increasing exponentially. Big data analysis can be used to make predictions,
identify trends, and solve complex problems.
g) What is ratio data? Write any two characteristics of ratio data.
Ratio data is a type of data that has a true zero point. This means that a value of
zero represents the absence of the quantity being measured. Two characteristics of
ratio data are:
    • It can be meaningfully divided.
    • It can be compared to other values using ratios.
h) What is the role of Business intelligence in decision making?
Business intelligence (BI) is a set of technologies and processes that help
businesses collect, analyze, and interpret data. BI can be used to make better
decisions, improve efficiency, and identify new opportunities.
The role of BI in decision making is to provide businesses with insights that they
can use to make better decisions. BI can help businesses to:
   • Understand their customers better
   • Identify trends in the market
   • Track their performance
   • Make better predictions
BI can be a valuable tool for businesses of all sizes. It can help businesses to make
better decisions, improve efficiency, and identify new opportunities.
 Q2) Solve Any Two :
       a) Why data cleaning is needed before data analysis?
Data cleaning is needed before data analysis because it ensures that the data is
accurate, complete, and consistent. This is important because inaccurate or
incomplete data can lead to inaccurate or misleading results.
Here are some of the reasons why data cleaning is needed before data analysis:
   •   To remove errors: Data cleaning can help to identify and remove errors
       from the data. This includes errors such as typos, missing values, and
       inconsistent formatting.
   •   To make the data complete: Data cleaning can help to identify and fill in
       missing values in the data. This is important because missing values can
       skew the results of the analysis.
   •   To make the data consistent: Data cleaning can help to ensure that the data
       is consistent in terms of its format, units, and values. This is important
       because inconsistent data can make it difficult to analyze the data.
In short, data cleaning is an important step in the data analysis process. It helps to
ensure that the data is accurate, complete, and consistent, which is essential for
getting accurate and reliable results.
Here are some of the common data cleaning tasks:
   •   Identifying and removing errors: This includes typos, missing values, and
       inconsistent formatting.
   •   Filling in missing values: This can be done using a variety of methods, such
       as interpolation or imputation.
   •   Correcting inconsistencies: This can involve standardizing units, formatting,
       or values.
   •   Categorizing data: This can help to make the data more manageable and
       easier to analyze.
   •   Cleaning up text data: This can involve removing noise, correcting spelling
       errors, and normalizing text.
Data cleaning can be a complex and time-consuming process, but it is essential for
getting accurate and reliable results from data analysis.
       b) Explain Hierarchical clustering giving a suitable example.
Sure. Hierarchical clustering is a type of clustering algorithm that groups data
points together based on their similarity. It works by creating a hierarchy of
clusters, starting with individual data points and merging them together until there
is only one cluster left.
There are two main types of hierarchical clustering: agglomerative and divisive.
Agglomerative hierarchical clustering starts with each data point as its own cluster
and merges them together until there is only one cluster left. Divisive hierarchical
clustering starts with all the data points in one cluster and then divides them into
smaller and smaller clusters until there are only individual data points left.
A suitable example of hierarchical clustering is grouping customers into different
segments based on their purchasing behavior. For example, you could use
hierarchical clustering to group customers into segments based on the products
they buy, the frequency of their purchases, and their spending habits.
Here is an example of how hierarchical clustering could be used to group
customers into different segments:
   1. Start with each customer as its own cluster.
   2. Calculate the similarity between each pair of clusters.
   3. Merge the two most similar clusters together.
   4. Repeat steps 2 and 3 until there is only one cluster left.
The dendrogram is a tree-like diagram that shows the hierarchy of clusters created
by hierarchical clustering. The dendrogram shows how the clusters were merged
together, and it can be used to visualize the relationships between the different
clusters.
Here is a dendrogram that shows the hierarchy of clusters created by hierarchical
clustering for the customer data:
  0
 / \
1 2
/\ /\
3 4 5 6
The dendrogram shows that the customer data was clustered into six clusters. The
numbers on the dendrogram represent the different clusters. The closer two
numbers are together on the dendrogram, the more similar the two clusters are.
Hierarchical clustering is a powerful tool for grouping data points together based
on their similarity. It is a versatile algorithm that can be used to cluster data from a
variety of domains.
        c) Explain Decision - tree Approach of data classification.
A decision tree is a supervised machine learning algorithm that can be used for
both classification and regression problems. It is a tree-like structure that
represents the decision rules for classifying data points.
The decision tree approach to data classification works by starting at the root node
of the tree and asking a question about the data point. The answer to the question
will determine which branch of the tree the data point will follow. The process will
continue until the data point reaches a leaf node, which will contain the
classification for the data point.
For example, let's say we have a decision tree that is used to classify customers as
either "good" or "bad" credit risks. The root node of the tree might ask the
question "Is the customer's credit score above 700?" If the answer is yes, the data
point will follow the branch that leads to the leaf node "good credit risk." If the
answer is no, the data point will follow the branch that leads to the leaf node "bad
credit risk."
Decision trees are a powerful tool for data classification because they are easy to
understand and interpret. They can also be used to handle complex data sets with a
large number of features.
Here are some of the advantages of using decision trees for data classification:
    • Easy to understand and interpret: Decision trees are easy to understand and
       interpret, which makes them a good choice for explaining the results of a
       classification model to business users.
    • Handle complex data sets: Decision trees can handle complex data sets with
       a large number of features. This is because decision trees can learn to
       identify the most important features for classification, even if there are
       many features in the data set.
    • Robust to noise: Decision trees are robust to noise, which means that they
       can still perform well even if the data set contains some noisy or incorrect
       data.
Here are some of the disadvantages of using decision trees for data classification:
    • Can be overfitting: Decision trees can be overfitting, which means that they
       can learn the training data too well and not generalize well to new data.
    • Not as accurate as other algorithms: Decision trees are not as accurate as
       some other machine learning algorithms, such as support vector machines.
Overall, decision trees are a powerful tool for data classification. They are easy to
understand and interpret, and they can handle complex data sets with a large
number of features. However, decision trees can be overfitting and not as accurate
as some other machine learning algorithms.
Q3)    Apply Apriori Algorithm to the given dataset to find frequent
item sets.(Given support value = 40%)
               Tid        Items Purchased
               100       Bread, Milk, Cake
               101       Bread, Diaper, Beer
               102       Milk, Diaper, Beer, Eggs
               103       Bread, Milk, Diaper, Beer
               104       Bread, Milk, Diaper, Cake
                                OR
       Consider the dataset given below and cluster the dataset by using
          Hierarchical clustering and plot the dendogram for it.
                Item      A      B       C     D     E
                 A        0
                 B        7       0
                 C        2       5      0
                 D        6       4      8     0
                 E        10      8      3     7     0
 Q4) Explain the use of Association Analysis in purchasing behaviour of the
        customers.
Sure. Association analysis is a data mining technique that can
be used to find patterns in customer purchasing behavior. It
can be used to identify items that are often purchased
together, or to identify products that are likely to be
purchased by a particular type of customer.
Association analysis can be used in a variety of ways to
improve customer purchasing behavior. For example, it can
be used to:
   •   Identify cross-sell opportunities: Cross-selling is the
       practice of selling additional products or services to
       existing customers. Association analysis can be used to
       identify products that are often purchased together, so
       that they can be cross-sold to customers.
   •   Personalize recommendations: Recommendation
      engines are used to recommend products or services to
      customers based on their past purchases. Association
      analysis can be used to improve the accuracy of
      recommendation engines by identifying products that
      are likely to be purchased by a particular type of
      customer.
  •   Optimize product placement: The placement of products
      in a store can have a significant impact on sales.
      Association analysis can be used to optimize product
      placement by identifying products that are likely to be
      purchased together.
Here are some examples of how association analysis can be
used to improve customer purchasing behavior:
  •   A grocery store might use association analysis to
      identify that customers who buy milk are also likely to
      buy bread. This information could then be used to place
      milk and bread near each other in the store, or to
      recommend bread to customers who buy milk.
  •   An online retailer might use association analysis to
      identify that customers who buy a particular type of
      laptop are also likely to buy a certain type of printer.
      This information could then be used to recommend the
      printer to customers who buy the laptop, or to offer a
      discount on the printer when the laptop is purchased.
  •   A website might use association analysis to identify
      that users who visit a particular page are also likely to
      visit other pages. This information could then be used
      to personalize the website for users, or to recommend
      other pages that the user might be interested in.
Association analysis is a powerful tool that can be used to
improve customer purchasing behavior. By identifying
patterns in customer purchasing behavior, businesses can
make better decisions about product placement,
recommendations, and cross-selling. This can lead to
increased sales and improved customer satisfaction.
                                     OR
        Explain the Density - based Clustering method giving a suitable example.
      Q5) A) Elaborate the use of data mining in target Marketing.
Density-based clustering is a type of clustering algorithm that
groups together data points that are densely packed together.
It is a non-parametric algorithm, which means that it does not
require the number of clusters to be known beforehand.
Density-based clustering works by first identifying core
points. A core point is a point that has a minimum number of
neighboring points within a certain radius. Once the core
points have been identified, they are then connected to form
clusters. The clusters are then expanded by adding
neighboring points that are within the radius of the core
points.
A suitable example of density-based clustering is grouping
customers into different segments based on their purchasing
behavior. For example, you could use density-based
clustering to group customers into segments based on the
products they buy, the frequency of their purchases, and their
spending habits.
Here is an example of how density-based clustering could be
used to group customers into different segments:
  1. Start by identifying the core points. A core point is a
     customer who has purchased a minimum number of
     products within a certain time period.
  2. Once the core points have been identified, they are then
     connected to form clusters. The clusters are then
     expanded by adding neighboring customers who have
     purchased products within the radius of the core points.
  3. The final clusters will represent different segments of
     customers based on their purchasing behaviour.
Density-based clustering is a powerful tool for grouping data
points together based on their density. It is a versatile
algorithm that can be used to cluster data from a variety of
domains.
Here are some of the advantages of density-based clustering:
  •   It is non-parametric, which means that it does not
      require the number of clusters to be known beforehand.
  •   It is able to identify clusters of arbitrary shapes and
      sizes.
  •   It is robust to noise.
Here are some of the disadvantages of density-based
clustering:
  •   It can be computationally expensive for large datasets.
  •   It can be sensitive to the choice of parameters.
Overall, density-based clustering is a powerful tool for
grouping data points together based on their density. It is a
versatile algorithm that can be used to cluster data from a
variety of domains.
                                         OR
     b) Elaborate the use of data mining for customer profiling.
Data mining is a process of extracting knowledge from data. It can be used to
create customer profiles, which are descriptions of the characteristics of a
customer group. Customer profiles can be used to improve customer targeting,
segmentation, and personalization.
There are many different data mining techniques that can be used for customer
profiling. Some of the most common techniques include:
   • Association analysis: This technique can be used to identify patterns in
       customer behavior. For example, it can be used to identify products that are
       often purchased together.
   • Clustering: This technique can be used to group customers together based
       on their similarities. For example, it can be used to group customers
       together based on their demographics, interests, or purchase behavior.
   • Classification: This technique can be used to assign customers to different
       categories. For example, it can be used to assign customers to different
       loyalty programs or marketing segments.
Customer profiles can be used to improve customer targeting, segmentation, and
personalization.
   • Customer targeting: Customer targeting is the process of identifying the
       customers who are most likely to be interested in a particular product or
       service. Customer profiles can be used to identify these customers by their
       demographics, interests, or purchase behavior.
   • Customer segmentation: Customer segmentation is the process of dividing
       customers into groups based on their similarities. Customer profiles can be
       used to segment customers into groups that are likely to have similar needs
       or interests.
   • Personalization: Personalization is the process of tailoring a product or
       service to the individual needs of a customer. Customer profiles can be used
       to personalize products or services by recommending products that are
       likely to be of interest to the customer, or by providing content that is
       tailored to the customer's interests.
Data mining for customer profiling is a powerful tool that can be used to improve
customer targeting, segmentation, and personalization. By understanding the
characteristics of their customers, businesses can better serve their customers and
improve their bottom line.
Here are some of the benefits of using data mining for customer profiling:
   • Improved customer targeting: Data mining can help businesses to identify
       the customers who are most likely to be interested in their products or
       services. This can help businesses to allocate their marketing resources
       more effectively and to achieve better results.
   •   Improved customer segmentation: Data mining can help businesses to
       segment their customers into groups based on their similarities. This can
       help businesses to better understand the needs of their customers and to
       tailor their products and services accordingly.
   • Improved customer personalization: Data mining can help businesses to
       personalize their products and services to the individual needs of their
       customers. This can help businesses to build stronger relationships with
       their customers and to increase customer satisfaction.
However, there are also some challenges associated with using data mining for
customer profiling:
   • Privacy concerns: Some customers may be concerned about the privacy
       implications of data mining. Businesses need to be transparent about how
       they collect and use customer data, and they need to obtain the consent of
       customers before using their data for customer profiling.
   • Data quality: The quality of the data used for customer profiling is critical.
       If the data is not accurate or complete, the results of the customer profiling
       will be inaccurate.
   • Technological challenges: Data mining can be a complex and challenging
       process. Businesses need to have the right tools and expertise to use data
       mining effectively.
Overall, data mining for customer profiling is a powerful tool that can be used to
improve customer targeting, segmentation, and personalization. However,
businesses need to be aware of the challenges associated with data mining and take
steps to mitigate these challenges.
                                     
 [5860]-212                              2