KEMBAR78
KDD Process in Data Mining | PDF | Data Mining | Data
0% found this document useful (0 votes)
23 views11 pages

KDD Process in Data Mining

The document outlines the KDD (Knowledge Discovery in Databases) process in data mining, which involves extracting useful information from large datasets through a series of iterative steps including data cleaning, integration, selection, transformation, mining, evaluation, and representation. It highlights the advantages of KDD such as improved decision-making and fraud detection, as well as disadvantages like privacy concerns and data quality issues. Additionally, it distinguishes between KDD and data mining, emphasizing that KDD focuses on discovering knowledge while data mining focuses on finding patterns.

Uploaded by

irfaanshaik27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views11 pages

KDD Process in Data Mining

The document outlines the KDD (Knowledge Discovery in Databases) process in data mining, which involves extracting useful information from large datasets through a series of iterative steps including data cleaning, integration, selection, transformation, mining, evaluation, and representation. It highlights the advantages of KDD such as improved decision-making and fraud detection, as well as disadvantages like privacy concerns and data quality issues. Additionally, it distinguishes between KDD and data mining, emphasizing that KDD focuses on discovering knowledge while data mining focuses on finding patterns.

Uploaded by

irfaanshaik27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

KDD Process in Data Mining

GANESH JATLA
KDD Process in Data Mining
In the context of computer science, “Data Mining” can be
referred to as knowledge mining from data, knowledge
extraction, data/pattern analysis, data archaeology, and data
dredging.

Data Mining also known as Knowledge Discovery in Databases,


refers to the nontrivial extraction of implicit, previously
unknown and potentially useful information from data stored in
databases.

The need of data mining is to extract useful information from


large datasets and use it to make predictions or better decision-
making. Nowadays, data mining is used in almost all places
where a large amount of data is stored and processed.

For examples: Banking sector, Market Basket Analysis, Network


Intrusion Detection.
KDD Process
KDD (Knowledge Discovery in Databases) is a process that
involves the extraction of useful, previously unknown, and
potentially valuable information from large datasets. The KDD
process is an iterative process and it requires multiple
iterations of the above steps to extract accurate knowledge
from the data.

The following steps are included in KDD process:

Data Cleaning
Data cleaning is defined as removal of noisy and irrelevant data
from collection.
Cleaning in case of Missing values.

Cleaning noisy data, where noise is a random or variance


error.
Cleaning with Data discrepancy detection and Data
transformation tools.
Data Integration

Data integration is defined as heterogeneous data


from multiple sources combined in a common
source(DataWarehouse). Data integration using Data
Migration tools, Data Synchronization tools and
ETL(Extract-Load-Transformation) process.

Data Selection

Data selection is defined as the process where data


relevant to the analysis is decided and retrieved from
the data collection. For this we can use Neural
network, Decision Trees, Naive bayes,
Clustering, and Regression methods.
Data Transformation
Data Transformation is defined as the process of
transforming data into appropriate form required by
mining procedure. Data Transformation is a two step
process:

Data Mapping: Assigning elements from source base


to destination to capture transformations.

Code generation: Creation of the actual


transformation program.

Data Mining
Data mining is defined as techniques that are applied
to extract patterns potentially useful. It transforms
task relevant data into patterns, and decides
purpose of model
using classification or characterization.
Pattern Evaluation
Pattern Evaluation is defined as identifying strictly
increasing patterns representing knowledge based
on given measures. It find interestingness score of
each pattern, and
uses summarization and Visualization to make
data understandable by user.
Knowledge Representation
This involves presenting the results in a way that is
meaningful and can be used to make decisions.
Note: KDD is an iterative process where evaluation
measures can be enhanced, mining can be refined,
new data can be integrated and transformed in order
to get different and more appropriate
results.Preprocessing of databases consists
of Data cleaning and Data Integration.
Advantages of KDD
Improves decision-making: KDD provides valuable insights
and knowledge that can help organizations make better
decisions.

Increased efficiency: KDD automates repetitive and time-


consuming tasks and makes the data ready for analysis, which
saves time and money.

Better customer service: KDD helps organizations gain a


better understanding of their customers’ needs and
preferences, which can help them provide better customer
service.

Fraud detection: KDD can be used to detect fraudulent


activities by identifying patterns and anomalies in the data
that may indicate fraud.

Predictive modeling: KDD can be used to build predictive


models that can forecast future trends and patterns.
Disadvantages of KDD
Privacy concerns: KDD can raise privacy concerns as it involves
collecting and analyzing large amounts of data, which can include
sensitive information about individuals.

Complexity: KDD can be a complex process that requires specialized


skills and knowledge to implement and interpret the results.

Unintended consequences: KDD can lead to unintended


consequences, such as bias or discrimination, if the data or models are
not properly understood or used.

Data Quality: KDD process heavily depends on the quality of data, if


data is not accurate or consistent, the results can be misleading

High cost: KDD can be an expensive process, requiring significant


investments in hardware, software, and personnel.

Overfitting: KDD process can lead to overfitting, which is a common


problem in machine learning where a model learns the detail and noise
in the training data to the extent that it negatively impacts the
performance of the model on new unseen data.
Difference between KDD and Data Mining
Parameter KDD Data Mining

KDD refers to a
process of
Data Mining refers
identifying valid,
to a process of
novel, potentially
extracting useful
useful, and
Definition and valuable
ultimately
information or
understandable
patterns from large
patterns and
data sets.
relationships in
data.

To
Data find
cleaning,useful
data To extract useful
Objective integration,
knowledge data
from Association
information rules,
from
selection,
data. data classification,
data.
transformation, data clustering, regression,
Techniques Used mining, pattern decision trees, neural
evaluation, and networks, and
knowledge dimensionality
representation and reduction.
visualization.
Parameter KDD Data Mining

Focus is on the
Data mining focus is on
discovery of useful
the discovery of
Focus knowledge, rather than
patterns or
simply finding patterns
relationships in data.
in data.

Domain expertise is
Domain expertise is
less critical in data
important in KDD, as it
mining, as the
helps in defining the
Role of domain algorithms are
goals of the process,
expertise designed to identify
choosing appropriate
patterns without
data, and interpreting
relying on prior
the results.
knowledge.

You might also like