0% found this document useful (0 votes)

155 views10 pages

KDD Process in Data Mining - Javatpoint

The document discusses the KDD (Knowledge Discovery in Databases) process, which is an iterative process for extracting useful knowledge from large datasets. The 9-step process includes understanding the problem domain, selecting and preprocessing data, transforming the data, applying data mining techniques like classification or clustering to generate patterns, selecting data mining algorithms, using the algorithms to build models and extract patterns, and finally interpreting and evaluating the results. The goal is to discover hidden patterns in large datasets that can provide insights and knowledge about the data.

Uploaded by

samwel sitta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

155 views10 pages

KDD Process in Data Mining - Javatpoint

Uploaded by

samwel sitta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

05/07/2020 KDD Process in Data Mining - Javatpoint

KDD- Knowledge Discovery in Databases

The term KDD stands for Knowledge Discovery in Databases. It refers to the broad procedure of discovering
knowledge in data and emphasizes the high-level applications of specific Data Mining techniques. It is a field of
interest to researchers in various fields, including artificial intelligence, machine learning, pattern recognition,
databases, statistics, knowledge acquisition for expert systems, and data visualization.

The main objective of the KDD process is to extract information from data in the context of large databases. It does
this by using Data Mining algorithms to identify what is deemed knowledge.

The Knowledge Discovery in Databases is considered as a programmed, exploratory analysis and modeling of vast
data repositories.KDD is the organized procedure of recognizing valid, useful, and understandable patterns from
huge and complex data sets. Data Mining is the root of the KDD procedure, including the inferring of algorithms that
investigate the data, develop the model, and find previously unknown patterns. The model is used for extracting the
knowledge from the data, analyze the data, and predict the data.

https://www.javatpoint.com/kdd-process-in-data-mining 1/10
05/07/2020 KDD Process in Data Mining - Javatpoint

The availability and abundance of data today make knowledge discovery and Data Mining a matter of impressive
significance and need. In the recent development of the field, it isn't surprising that a wide variety of techniques is
presently accessible to specialists and experts.

The KDD Process

The knowledge discovery process(illustrates in the given figure) is iterative and interactive, comprises of nine steps.
The process is iterative at each stage, implying that moving back to the previous actions might be required. The
process has many imaginative aspects in the sense that one cant presents one formula or make a complete scientific
categorization for the correct decisions for each step and application type. Thus, it is needed to understand the
process and the different requirements and possibilities in each stage.

The process begins with determining the KDD objectives and ends with the implementation of the discovered
knowledge. At that point, the loop is closed, and the Active Data Mining starts. Subsequently, changes would need to
be made in the application domain. For example, offering various features to cell phone users in order to reduce
churn. This closes the loop, and the impacts are then measured on the new data repositories, and the KDD process
again. Following is a concise description of the nine-step KDD process, Beginning with a managerial step:

https://www.javatpoint.com/kdd-process-in-data-mining 2/10
05/07/2020 KDD Process in Data Mining - Javatpoint

1. Building up an understanding of the application domain

This is the initial preliminary step. It develops the scene for understanding what should be done with the various
decisions like transformation, algorithms, representation, etc. The individuals who are in charge of a KDD venture
need to understand and characterize the objectives of the end-user and the environment in which the knowledge
discovery process will occur ( involves relevant prior knowledge).

2. Choosing and creating a data set on which discovery will be performed

Once defined the objectives, the data that will be utilized for the knowledge discovery process should be determined.
This incorporates discovering what data is accessible, obtaining important data, and afterward integrating all the data
for knowledge discovery onto one set involves the qualities that will be considered for the process. This process is
important because of Data Mining learns and discovers from the accessible data. This is the evidence base for
building the models. If some significant attributes are missing, at that point, then the entire study may be ⇧

https://www.javatpoint.com/kdd-process-in-data-mining 3/10
05/07/2020 KDD Process in Data Mining - Javatpoint

unsuccessful from this respect, the more attributes are considered. On the other hand, to organize, collect, and
operate advanced data repositories is expensive, and there is an arrangement with the opportunity for best
understanding the phenomena. This arrangement refers to an aspect where the interactive and iterative aspect of the
KDD is taking place. This begins with the best available data sets and later expands and observes the impact in
terms of knowledge discovery and modeling.

3. Preprocessing and cleansing

In this step, data reliability is improved. It incorporates data clearing, for example, Handling the missing quantities
and removal of noise or outliers. It might include complex statistical techniques or use a Data Mining algorithm in this
context. For example, when one suspects that a specific attribute of lacking reliability or has many missing data, at
this point, this attribute could turn into the objective of the Data Mining supervised algorithm. A prediction model for
these attributes will be created, and after that, missing data can be predicted. The expansion to which one pays
attention to this level relies upon numerous factors. Regardless, studying the aspects is significant and regularly
revealing by itself, to enterprise data frameworks.

4. Data Transformation

In this stage, the creation of appropriate data for Data Mining is prepared and developed. Techniques here
incorporate dimension reduction( for example, feature selection and extraction and record sampling), also attribute
transformation(for example, discretization of numerical attributes and functional transformation). This step can be
essential for the success of the entire KDD project, and it is typically very project-specific. For example, in medical
assessments, the quotient of attributes may often be the most significant factor and not each one by itself. In
business, we may need to think about impacts beyond our control as well as efforts and transient issues. For
example, studying the impact of advertising accumulation. However, if we do not utilize the right transformation at the
starting, then we may acquire an amazing effect that insights to us about the transformation required in the next
iteration. Thus, the KDD process follows upon itself and prompts an understanding of the transformation required.

5. Prediction and description

We are now prepared to decide on which kind of Data Mining to use, for example, classification, regression,
clustering, etc. This mainly relies on the KDD objectives, and also on the previous steps. There are two significant
⇧
objectives in Data Mining, the first one is a prediction, and the second one is the description. Prediction is usually

https://www.javatpoint.com/kdd-process-in-data-mining 4/10
05/07/2020 KDD Process in Data Mining - Javatpoint

referred to as supervised Data Mining, while descriptive Data Mining incorporates the unsupervised and visualization
aspects of Data Mining. Most Data Mining techniques depend on inductive learning, where a model is built explicitly
or implicitly by generalizing from an adequate number of preparing models. The fundamental assumption of the
inductive approach is that the prepared model applies to future cases. The technique also takes into account the
level of meta-learning for the specific set of accessible data.

6. Selecting the Data Mining algorithm

Having the technique, we now decide on the strategies. This stage incorporates choosing a particular technique to
be used for searching patterns that include multiple inducers. For example, considering precision versus
understandability, the previous is better with neural networks, while the latter is better with decision trees. For each
system of meta-learning, there are several possibilities of how it can be succeeded. Meta-learning focuses on
clarifying what causes a Data Mining algorithm to be fruitful or not in a specific issue. Thus, this methodology
attempts to understand the situation under which a Data Mining algorithm is most suitable. Each algorithm has
parameters and strategies of leaning, such as ten folds cross-validation or another division for training and testing.

7. Utilizing the Data Mining algorithm

At last, the implementation of the Data Mining algorithm is reached. In this stage, we may need to utilize the
algorithm several times until a satisfying outcome is obtained. For example, by turning the algorithms control
parameters, such as the minimum number of instances in a single leaf of a decision tree.

8. Evaluation

In this step, we assess and interpret the mined patterns, rules, and reliability to the objective characterized in the first
step. Here we consider the preprocessing steps as for their impact on the Data Mining algorithm results. For
example, including a feature in step 4, and repeat from there. This step focuses on the comprehensibility and utility of
the induced model. In this step, the identified knowledge is also recorded for further use. The last step is the use, and
overall feedback and discovery results acquire by Data Mining.

9. Using the discovered knowledge

https://www.javatpoint.com/kdd-process-in-data-mining 5/10
05/07/2020 KDD Process in Data Mining - Javatpoint

Now, we are prepared to include the knowledge into another system for further activity. The knowledge becomes
effective in the sense that we may make changes to the system and measure the impacts. The accomplishment of
this step decides the effectiveness of the whole KDD process. There are numerous challenges in this step, such as
losing the "laboratory conditions" under which we have worked. For example, the knowledge was discovered from a
certain static depiction, it is usually a set of data, but now the data becomes dynamic. Data structures may change
certain quantities that become unavailable, and the data domain might be modified, such as an attribute that may
have a value that was not expected previously.

← prev next →

Help Others, Please Share

https://www.javatpoint.com/kdd-process-in-data-mining 6/10
05/07/2020 KDD Process in Data Mining - Javatpoint

Learn Latest Tutorials

MySQL Python Smartsheet Affiliate M. Testing

Proc*C SMM GDB Fuzzy Logic DHTML

Classroom AutoCad

Preparation

Aptitude Reasoning Verbal A. Interview Company

Trending Technologies ⇧

https://www.javatpoint.com/kdd-process-in-data-mining 7/10
05/07/2020 KDD Process in Data Mining - Javatpoint

AI AWS Selenium Cloud Hadoop

ReactJS D. Science Angular 7 Blockchain Git

ML DevOps

B.Tech / MCA

DBMS DS DAA OS C. Network

Compiler D. COA D. Math. E. Hacking C. Graphics ⇧

https://www.javatpoint.com/kdd-process-in-data-mining 8/10
05/07/2020 KDD Process in Data Mining - Javatpoint

Software E. Web Tech. Cyber Sec. Automata C

C++ Java .Net Python Programs

Control S. Data Mining

https://www.javatpoint.com/kdd-process-in-data-mining 9/10
05/07/2020 KDD Process in Data Mining - Javatpoint

https://www.javatpoint.com/kdd-process-in-data-mining 10/10

KDD Process Mode Framework
No ratings yet
KDD Process Mode Framework
5 pages
Chapter 3
No ratings yet
Chapter 3
5 pages
DMW ALLinONE
No ratings yet
DMW ALLinONE
64 pages
PPT-DWDM Unit 3
No ratings yet
PPT-DWDM Unit 3
106 pages
Dmbi Unit-3
No ratings yet
Dmbi Unit-3
21 pages
Unit 1
No ratings yet
Unit 1
43 pages
Data Mining Versus Knowledge Discovery I
No ratings yet
Data Mining Versus Knowledge Discovery I
3 pages
UNIT - 1 Data Mining
No ratings yet
UNIT - 1 Data Mining
16 pages
Fund Data Science
No ratings yet
Fund Data Science
91 pages
Data Mining & KDD Overview
No ratings yet
Data Mining & KDD Overview
22 pages
Knowledge Discovery in Databases
No ratings yet
Knowledge Discovery in Databases
29 pages
Chapter 3 DATA MINIG
No ratings yet
Chapter 3 DATA MINIG
17 pages
What Is The KDD Process
No ratings yet
What Is The KDD Process
2 pages
Data Mining and KDD
No ratings yet
Data Mining and KDD
15 pages
Steps Involved in KDD Process: Data Mining
No ratings yet
Steps Involved in KDD Process: Data Mining
14 pages
DWM 4
No ratings yet
DWM 4
23 pages
KDD Process in Data Mining
No ratings yet
KDD Process in Data Mining
11 pages
U1 - Data Warehouse Intro
No ratings yet
U1 - Data Warehouse Intro
13 pages
KDD
No ratings yet
KDD
3 pages
Data Mining Essentials for IT Students
No ratings yet
Data Mining Essentials for IT Students
50 pages
Overview of The KDD Process
No ratings yet
Overview of The KDD Process
3 pages
NCVRT Datamining
No ratings yet
NCVRT Datamining
43 pages
Paper Ljupce Markusheski PHD
No ratings yet
Paper Ljupce Markusheski PHD
12 pages
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
No ratings yet
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
4 pages
Knowledge Discovery Database (KDD Process)
No ratings yet
Knowledge Discovery Database (KDD Process)
5 pages
Data Mining New
No ratings yet
Data Mining New
21 pages
Unit Iii
No ratings yet
Unit Iii
33 pages
KDD-Knowledge Discovery in Databases
No ratings yet
KDD-Knowledge Discovery in Databases
5 pages
Data Mining
No ratings yet
Data Mining
25 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
22 pages
Ch1 Overview KDD - ML
No ratings yet
Ch1 Overview KDD - ML
23 pages
Data Mining 14
No ratings yet
Data Mining 14
3 pages
Data Mining Assignment
No ratings yet
Data Mining Assignment
11 pages
04cali 67
No ratings yet
04cali 67
8 pages
Explanation For KDD
No ratings yet
Explanation For KDD
2 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Data Mining Frameworks Explained
No ratings yet
Data Mining Frameworks Explained
28 pages
Data Mining and Knowledge Discovery By, Amit Vaghela (020102017)
No ratings yet
Data Mining and Knowledge Discovery By, Amit Vaghela (020102017)
16 pages
Lecture 1
No ratings yet
Lecture 1
28 pages
DWDM Notes - Unit 1
No ratings yet
DWDM Notes - Unit 1
26 pages
DM Course Material
No ratings yet
DM Course Material
128 pages
Chapter 1 - Introduction To Knowledge Discovery in
No ratings yet
Chapter 1 - Introduction To Knowledge Discovery in
18 pages
Knowledge Discovery in Databases
No ratings yet
Knowledge Discovery in Databases
17 pages
Data Mining New Notes Unit 3 PDF
No ratings yet
Data Mining New Notes Unit 3 PDF
12 pages
Knoledge Discovery in Databases
No ratings yet
Knoledge Discovery in Databases
6 pages
Business Understanding This Step Involves Understanding The Problem That Needs To Be Solved and Defining The Objectives of The Data Mining Project
No ratings yet
Business Understanding This Step Involves Understanding The Problem That Needs To Be Solved and Defining The Objectives of The Data Mining Project
5 pages
Data Mining Chapter 1
0% (1)
Data Mining Chapter 1
12 pages
Knowledge Discovery and Data Mining
No ratings yet
Knowledge Discovery and Data Mining
5 pages
UNESCO Courses: Module On Knowledge Discovery and Data Mining
No ratings yet
UNESCO Courses: Module On Knowledge Discovery and Data Mining
28 pages
Unit-1 Data Mining
No ratings yet
Unit-1 Data Mining
19 pages
Topic 3 - Data Mining
No ratings yet
Topic 3 - Data Mining
37 pages
Intelligent Knowledge Discovery
No ratings yet
Intelligent Knowledge Discovery
4 pages
SIMS 422: Knowledge Inference Systems & Applications
No ratings yet
SIMS 422: Knowledge Inference Systems & Applications
28 pages
FDS Unit 1
No ratings yet
FDS Unit 1
20 pages
B SC (IT) VI-DSE3-M5
No ratings yet
B SC (IT) VI-DSE3-M5
13 pages
Chapter 7
No ratings yet
Chapter 7
26 pages
DM Week 2 Des
No ratings yet
DM Week 2 Des
3 pages
University of Dar Es Salaam Coict: Department of Computer Science & Eng
No ratings yet
University of Dar Es Salaam Coict: Department of Computer Science & Eng
42 pages
University of Dar Es Salaam Coict: Department of Computer Science & Eng
No ratings yet
University of Dar Es Salaam Coict: Department of Computer Science & Eng
26 pages
University of Dar Es Salaam Coict: Department of Computer Science & Eng
No ratings yet
University of Dar Es Salaam Coict: Department of Computer Science & Eng
41 pages
IS364 - Lecture 09 - Symmetric Encryption
No ratings yet
IS364 - Lecture 09 - Symmetric Encryption
41 pages
University of Dar Es Salaam Coict: Department of Computer Science & Eng
No ratings yet
University of Dar Es Salaam Coict: Department of Computer Science & Eng
24 pages
Wireless Network Security Guide
No ratings yet
Wireless Network Security Guide
29 pages
Lecture 5-X Window and Configuration
No ratings yet
Lecture 5-X Window and Configuration
22 pages
Network Security for IT Professionals
No ratings yet
Network Security for IT Professionals
30 pages
IS364 - Lecture 01 - Introduction ITSEC
No ratings yet
IS364 - Lecture 01 - Introduction ITSEC
34 pages
Cybersecurity Essentials Guide
No ratings yet
Cybersecurity Essentials Guide
25 pages
Lecture 5 - Memory Management
No ratings yet
Lecture 5 - Memory Management
47 pages
Data Mining Tools Overview
No ratings yet
Data Mining Tools Overview
12 pages
Linux System Administration Basics
No ratings yet
Linux System Administration Basics
20 pages
Lecture 4 - Software Quality Management PDF
No ratings yet
Lecture 4 - Software Quality Management PDF
51 pages
Lecture 6 - File Management Security
No ratings yet
Lecture 6 - File Management Security
103 pages
Lecture 3 - Process Management (Concurrency-Mutual Exclusion, Synchronization Concurrency Problems)
No ratings yet
Lecture 3 - Process Management (Concurrency-Mutual Exclusion, Synchronization Concurrency Problems)
67 pages
DataStage Architecture
No ratings yet
DataStage Architecture
10 pages
UNIX Process Management Guide
100% (8)
UNIX Process Management Guide
14 pages
1.identity With Windows Server 2016
No ratings yet
1.identity With Windows Server 2016
86 pages
Pharmaceutical Distribution Management System - 3ppt
0% (1)
Pharmaceutical Distribution Management System - 3ppt
33 pages
ServiceNow Scripting Guide
100% (1)
ServiceNow Scripting Guide
141 pages
Windows Server 2012 R2 Hardening Checklist - IsO - Information Security Office - UT Austin Wikis
100% (1)
Windows Server 2012 R2 Hardening Checklist - IsO - Information Security Office - UT Austin Wikis
7 pages
Web Application Security Threats
No ratings yet
Web Application Security Threats
9 pages
The Forensic Investigation of Hike Messenger and Imo Application On Android Devices
No ratings yet
The Forensic Investigation of Hike Messenger and Imo Application On Android Devices
6 pages
Hibernate Interview Questions PDF
100% (5)
Hibernate Interview Questions PDF
32 pages
Installation Instructions For SAP GUI For Java 74 v1
No ratings yet
Installation Instructions For SAP GUI For Java 74 v1
21 pages
Associate Cloud Engineer (How To Prepare For Exams)
No ratings yet
Associate Cloud Engineer (How To Prepare For Exams)
7 pages
Memory Management in xv6
No ratings yet
Memory Management in xv6
14 pages
Soft Key Solutions - HASP4 HASP HL Hardlock Dongle Emulator For Aladdin Hardware Key
100% (1)
Soft Key Solutions - HASP4 HASP HL Hardlock Dongle Emulator For Aladdin Hardware Key
4 pages
DB Foresight User Guide
No ratings yet
DB Foresight User Guide
76 pages
Misc On Board Training
No ratings yet
Misc On Board Training
22 pages
Voltage SecureData Enterprise - Aster Scalar UDF Integration Guide
100% (1)
Voltage SecureData Enterprise - Aster Scalar UDF Integration Guide
15 pages
Product Data: PULSE Sound Quality Software Type 7698
No ratings yet
Product Data: PULSE Sound Quality Software Type 7698
12 pages
Chapter 1 Java Csc60a
No ratings yet
Chapter 1 Java Csc60a
63 pages
M2o 1 02
No ratings yet
M2o 1 02
31 pages
Embedded Analytics for Developers
No ratings yet
Embedded Analytics for Developers
9 pages
Smart City Modules Overview
0% (1)
Smart City Modules Overview
3 pages
Risk Management Roles and Responsibilities
No ratings yet
Risk Management Roles and Responsibilities
2 pages
Tutorial How To Install Softwares Maptek Vulcan 7
No ratings yet
Tutorial How To Install Softwares Maptek Vulcan 7
9 pages
HPHWDiag Log
No ratings yet
HPHWDiag Log
28 pages
Advanced GDI for C++ Developers
No ratings yet
Advanced GDI for C++ Developers
33 pages
Context Manager
No ratings yet
Context Manager
39 pages
Beginner's Guide to Ionic Apps
No ratings yet
Beginner's Guide to Ionic Apps
11 pages
Eurovigil I Cube 2000 2MP PDF
No ratings yet
Eurovigil I Cube 2000 2MP PDF
28 pages
Bitdefender VS Avast PDF
No ratings yet
Bitdefender VS Avast PDF
8 pages
OAFramework
No ratings yet
OAFramework
21 pages

KDD Process in Data Mining - Javatpoint

Uploaded by

KDD Process in Data Mining - Javatpoint

Uploaded by

05/07/2020 KDD Process in Data Mining - Javatpoint

KDD- Knowledge Discovery in Databases

The KDD Process

1. Building up an understanding of the application domain

2. Choosing and creating a data set on which discovery will be performed

3. Preprocessing and cleansing

5. Prediction and description

6. Selecting the Data Mining algorithm

7. Utilizing the Data Mining algorithm

9. Using the discovered knowledge

Help Others, Please Share

Learn Latest Tutorials

MySQL Python Smartsheet Affiliate M. Testing

Proc*C SMM GDB Fuzzy Logic DHTML

Aptitude Reasoning Verbal A. Interview Company

AI AWS Selenium Cloud Hadoop

ReactJS D. Science Angular 7 Blockchain Git

DBMS DS DAA OS C. Network

Compiler D. COA D. Math. E. Hacking C. Graphics ⇧

Software E. Web Tech. Cyber Sec. Automata C

C++ Java .Net Python Programs

Control S. Data Mining

You might also like