0% found this document useful (0 votes)

31 views30 pages

Unit 1 Data Mining

Data mining is the process of extracting knowledge from large datasets using various techniques such as classification, clustering, regression, and association rule mining. It aims to discover hidden patterns and relationships to inform decision-making across various industries, including marketing and healthcare. Challenges in data mining include data quality, complexity, privacy concerns, and the need for scalable algorithms.

Uploaded by

animestudio0707

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views30 pages

Unit 1 Data Mining

Uploaded by

animestudio0707

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 30

DATA MINING

What is “Science”
Systematic, Comprehensive, Investigation and
Exploration of Natural, Causes and Effects.
What is “Data”
Data refers to a collection of facts, information, and
statistics that can be in various forms such as numbers,
text, sound, images, or any other format.
DATA MINING : Data mining is the process of
extracting knowledge or insights from large
amounts of data using various statistical and
computational techniques.

The data can be structured, semi-structured or

unstructured, and can be stored in various forms
such as databases, data warehouses, and data lakes.

The primary goal of data mining is to discover

hidden patterns and relationships in the data that can
be used to make informed decisions or predictions.

This involves exploring the data using various

techniques such as clustering, classification,
regression analysis, association rule mining, and
DATA MINING
anomaly detection.

Data mining has a wide range of applications across

various industries, including marketing, finance,
healthcare, and telecommunications.
DATA MINING
For example, in marketing, data mining can be used
to identify customer segments and target marketing
campaigns, while in healthcare, it can be used to
identify risk factors for diseases and develop
personalized treatment plans.

KDD(KNOWLEDGE DECISION DATABASES)

Vs DATAMINIG

Difference between KDD and Data Mining

Parameter KDD Data Mining

KDD refers to
a process of Data Mining refers
identifying valid, to a process of
novel, potentially extracting useful
Definition useful, and and valuable
ultimately information or
understandable patterns from large
patterns and data sets.
relationships in data.
DATA MINING

Parameter KDD Data Mining

To find useful To extract useful

Objective knowledge from information from
data. data.

Data cleaning, data

Association rules,
integration, data
classification,
selection, data
clustering,
transformation, data
Techniques regression, decision
mining, pattern
Used trees, neural
evaluation, and
networks, and
knowledge
dimensionality
representation and
reduction.
visualization.

Structured Patterns,
information, such as associations, or
rules and models, insights that can be
Output
that can be used to used to improve
make decisions or decision-making or
predictions. understanding.
DATA MINING

Parameter KDD Data Mining

Focus is on the Data mining focus

discovery of useful is on the discovery
Focus knowledge, rather of patterns or
than simply finding relationships in
patterns in data. data.

Domain expertise is Domain expertise is

important in KDD, less critical in data
as it helps in defining mining, as the
Role of
the goals of the algorithms are
domain
process, choosing designed to identify
expertise
appropriate data, and patterns without
interpreting the relying on prior
results. knowledge.

DATABASE Vs DATAMINING :
DATA MINING

DATAMINING TECHNIQUES :
DATA MINING

1. Classification:
This technique is used to obtain important and
relevant information about data and metadata. This
data mining technique helps to classify data in
different classes.
Data mining techniques can be classified by
different criteria, as follows:

 Classification of Data mining frameworks as per the

type of data sources mined:
This classification is as per the type of data handled. For
example, multimedia, spatial data, text data, time-series
data, World Wide Web, and so on..
 Classification of data mining frameworks as per
the database involved:
This classification based on the data model involved. For
example. Object-oriented database, transactional
database, relational database, and so on..
 Classification of data mining frameworks as per
the kind of knowledge discovered:
DATA MINING
This classification depends on the types of knowledge
discovered or data mining functionalities. For example,
discrimination, classification, clustering, characterization,
etc. some frameworks tend to be extensive frameworks
offering a few data mining functionalities together..
 Classification of data mining frameworks according
to data mining techniques used:
This classification is as per the data analysis approach
utilized, such as neural networks, machine learning,
genetic algorithms, visualization, statistics, data
warehouse-oriented or database-oriented, etc.
The classification can also take into account, the level of
user interaction involved in the data mining procedure,
such as query-driven systems, autonomous systems, or
interactive exploratory systems.
2. Clustering:
Clustering is a division of information into groups of
connected objects. Describing the data by a few clusters
mainly loses certain confine details, but accomplishes
improvement.
DATA MINING
It models data by its clusters. Data modeling puts
clustering from a historical point of view rooted
in statistics, mathematics, and numerical analysis.
From a machine learning point of view, clusters relate to
hidden patterns, the search for clusters is unsupervised
learning, and the subsequent framework represents a data
concept.
From a practical point of view, clustering plays an
extraordinary job in data mining applications. For
example, scientific data exploration, text mining,
information retrieval, spatial database applications, CRM,
Web analysis, computational biology, medical
diagnostics, and much more.
In other words, we can say that Clustering analysis is a
data mining technique to identify similar data. This
technique helps to recognize the differences and
similarities between the data.
Clustering is very similar to the classification, but it
involves grouping chunks of data together based on their
similarities.
3. Regression:
DATA MINING
Regression analysis is the data mining process is used to
identify and analyze the relationship between variables
because of the presence of the other factor.
It is used to define the probability of the specific variable.
Regression, primarily a form of planning and modeling.
For example, we might use it to project certain costs,
depending on other factors such as availability, consumer
demand, and competition.
Primarily it gives the exact relationship between two or
more variables in the given data set.
4. Association Rules:
This data mining technique helps to discover a link between
two or more items. It finds a hidden pattern in the data
set.
Association rules are if-then statements that support to
show the probability of interactions between data items
within large data sets in different types of databases.
Association rule mining has several applications and is
commonly used to help sales correlations in data or
medical data sets.
DATA MINING
The way the algorithm works is that you have various
data, For example, a list of grocery items that you have
been buying for the last six months. It calculates a
percentage of items being purchased together.
5. Outer detection:
This type of data mining technique relates to the
observation of data items in the data set, which do not
match an expected pattern or expected behavior.
This technique may be used in various domains like
intrusion, detection, fraud detection, etc. It is also known
as Outlier Analysis or Outlier mining.
The outlier is a data point that diverges too much from
the rest of the dataset. The majority of the real-world
datasets have an outlier.
Outlier detection plays a significant role in the data
mining field.
Outlier detection is valuable in numerous fields like
network interruption identification, credit or debit card
fraud detection, detecting outlying in wireless sensor
network data, etc.
6. Sequential Patterns:
DATA MINING
The sequential pattern is a data mining technique
specialized for evaluating sequential data to discover
sequential patterns.
It comprises of finding interesting subsequences in a set
of sequences, where the stake of a sequence can be
measured in terms of different criteria like length,
occurrence frequency, etc.

In other words, this technique of data mining helps to

discover or recognize similar patterns in transaction data
over some time.
7. Prediction:
Prediction used a combination of other data mining
techniques such as trends, clustering, classification, etc. It
analyzes past events or instances in the right sequence to
predict a future event.

PROBLEMS, ISSUES AND CHALLENGES IN

DATAMINING :
DATA MINING
Data mining is not an easy task, as the algorithms used
can get very complex and data is not always available at
one place. It needs to be integrated from various
heterogeneous data sources.
These factors also create some issues, we will discuss the
major issues regarding −
 Mining Methodology and User Interaction
 Performance Issues
 Diverse Data Types Issues
The following diagram describes the major issues.
DATA MINING
Data Mining issues
Mining Methodology and User Interaction Issues
It refers to the following kinds of issues −
 Mining different kinds of knowledge in databases −
Different users may be interested in different kinds
of knowledge. Therefore it is necessary for data
mining to cover a broad range of knowledge
discovery task.

 Interactive mining of knowledge at multiple levels of

abstraction − The data mining process needs to be
interactive because it allows users to focus the
search for patterns, providing and refining data
mining requests based on the returned results.

 Incorporation of background knowledge − To guide

discovery process and to express the discovered
patterns, the background knowledge can be used.

Background knowledge may be used to express the

discovered patterns not only in concise terms but
at multiple levels of abstraction.
DATA MINING

 Data mining query languages and ad hoc data mining

− Data Mining Query language that allows the user to
describe ad hoc mining tasks, should be integrated
with a data warehouse query language and optimized
for efficient and flexible data mining.

 Presentation and visualization of data mining results

− Once the patterns are discovered it needs to be
expressed in high level languages, and visual
representations. These representations should be
easily understandable.

 Handling noisy or incomplete data − The data

cleaning methods are required to handle the noise
and incomplete objects while mining the data
regularities. If the data cleaning methods are not
there then the accuracy of the discovered patterns
will be poor.

 Pattern evaluation − The patterns discovered should

be interesting because either they represent
common knowledge or lack novelty.
DATA MINING

Performance Issues
There can be performance-related issues such as follows −
 Efficiency and scalability of data mining algorithms
− In order to effectively extract the information from
huge amount of data in databases, data mining
algorithm must be efficient and scalable.

 Parallel, distributed, and incremental mining

algorithms − The factors such as huge size
of
databases, wide distribution of data, and complexity of
data mining methods motivate the development of
parallel and distributed data mining algorithms.

These algorithms divide the data into partitions which is

further processed in a parallel fashion. Then the
results from the partitions is merged.

The incremental algorithms, update databases

without mining the data again from scratch.
DATA MINING
Diverse Data Types Issues
 Handling of relational and complex types of data −
The database may contain complex data objects,
multimedia data objects, spatial data, temporal data
etc. It is not possible for one system to mine all these
kind of data.
 Mining information from heterogeneous databases
and global information systems − The data is
available at different data sources on LAN or WAN.
These data source may be structured, semi
structured or unstructured. Therefore mining the
knowledge from them adds challenges to data
mining.
PROBLEMS IN DATAMINING :
1. Poor data quality such as noisy data, dirty data, missing
values, inexact or incorrect values, inadequate data size
and poor representation in data sampling.
2. Integrating conflicting or redundant data from
different sources and forms: multimedia files (audio,
video and
images), geo data, text, social, numeric, etc…
3. Proliferation of security and privacy concerns
by individuals, organizations and governments.
DATA MINING
4. Unavailability of data or difficult access to data.
5. Efficiency and scalability of data mining algorithms
to effectively extract the information from huge amount
of data in databases.
6. Dealing with huge datasets that require distributed
approaches.
7. Dealing with non-static, unbalanced and cost-sensitive
data.
8. Mining information from heterogeneous databases
and global information systems.
9. Constant updation of models to handle data velocity
or new incoming data.
10. High cost of buying and maintaining powerful
softwares, servers and storage hardwares that handle
large amounts of data.
11. Processing of large, complex and unstructured
data into a structured format.
12. Sheer quantity of output from many data mining
methods.
CHALLENGES IN DATAMINING:
DATA MINING
1. Data Quality
The quality of data used in data mining is one of the most
significant challenges. The accuracy, completeness, and
consistency of the data affect the accuracy of the results
obtained.
The data may contain errors, omissions, duplications, or
inconsistencies, which may lead to inaccurate results.
Moreover, the data may be incomplete, meaning that
some attributes or values are missing, making it
challenging to obtain a complete understanding of the
data.
Data quality issues can arise due to a variety of reasons,
including data entry errors, data storage issues, data
integration problems, and data transmission errors.
To address these challenges, data mining practitioners
must apply data cleaning and data preprocessing
techniques to improve the quality of the data.
Data cleaning involves detecting and correcting errors,
while data preprocessing involves transforming the data
to make it suitable for data mining.
2. Data Complexity
DATA MINING
Data complexity refers to the vast amounts of data
generated by various sources, such as sensors, social
media, and the internet of things (IoT).
The complexity of the data may make it challenging to
process, analyze, and understand. In addition, the data
may be in different formats, making it challenging to
integrate into a single dataset.
To address this challenge, data mining practitioners use
advanced techniques such as clustering, classification, and
association rule mining. These techniques help to identify
patterns and relationships in the data, which can then be
used to gain insights and make predictions.
3. Data Privacy and Security
Data privacy and security is another significant
challenge in data mining. As more data is collected,
stored, and analyzed, the risk of data breaches and cyber-
attacks increases.
The data may contain personal, sensitive, or confidential
information that must be protected.
DATA MINING
Moreover, data privacy regulations such as GDPR,
CCPA, and HIPAA impose strict rules on how data can
be collected, used, and shared.
To address this challenge, data mining practitioners must
apply data anonymization and data encryption
techniques to protect the privacy and security of the data.
Data anonymization involves removing personally
identifiable information (PII) from the data, while data
encryption involves using algorithms to encode the data to
make it unreadable to unauthorized users.
4. Scalability
Data mining algorithms must be scalable to handle large
datasets efficiently. As the size of the dataset increases,
the time and computational resources required to perform
data mining operations also increase.
Moreover, the algorithms must be able to handle
streaming data, which is generated continuously and must
be processed in real-time.
To address this challenge, data mining practitioners use
distributed computing frameworks such as Hadoop and
Spark.
DATA MINING
These frameworks distribute the data and processing
across multiple nodes, making it possible to process large
datasets quickly and efficiently.
5. Interpretability
Data mining algorithms can produce complex models that
are difficult to interpret. This is because the algorithms
use a combination of statistical and mathematical
techniques to identify patterns and relationships in the
data.
Moreover, the models may not be intuitive, making
it challenging to understand how the model arrived
at a particular conclusion.
To address this challenge, data mining practitioners use
visualization techniques to represent the data and the
models visually.
Visualization makes it easier to understand the patterns
and relationships in the data and to identify the most
important variables.
6. Ethics
Data mining raises ethical concerns related to the
collection, use, and dissemination of data. The data may
DATA MINING
be used to discriminate against certain groups, violate
privacy rights, or perpetuate existing biases.
Moreover, data mining algorithms may not be transparent,
making it challenging to detect biases or discrimination.
DATAMINING APPLICATIONS

 Scientific Analysis: Scientific simulations are

generating bulks of data every day. This
includes data collected from nuclear
laboratories, data about human psychology,
etc.
DATA MINING
Data mining techniques are capable of the
analysis of these data. Now we can capture and
store more new data faster than we can analyze
the old data already accumulated. Example of
scientific analysis:

 Sequence analysis in bioinformatics

 Classification of astronomical objects
 Medical decision support.

 Intrusion Detection: A network intrusion refers

to any unauthorized activity on a digital
network.

Network intrusions often involve stealing

valuable network resources. Data mining
technique plays a vital role in searching
intrusion detection, network attacks, and
anomalies.
DATA MINING
These techniques help in selecting and refining
useful and relevant information from large data
sets. Data mining technique helps in classify
relevant data for Intrusion Detection System.

Intrusion Detection system generates alarms for

the network traffic about the foreign invasions in
the system. For example:

 Detect security violations

 Misuse Detection
 Anomaly Detection

 Business Transactions: Every business industry

is memorized for perpetuity. Such transactions
are usually time-related and can be inter-
business deals or intra-business operations.

The effective and in-time use of the data in a

reasonable time frame for competitive decision-
making is definitely the most important problem
DATA MINING
to solve for businesses that struggle to survive in
a highly competitive world.

Data mining helps to analyze these business

transactions and identify marketing approaches
and decision-making. Example :

 Direct mail targeting

 Stock trading
 Customer segmentation

 Market Basket Analysis: Market Basket

Analysis is a technique that gives the careful
study of purchases done by a customer in a
supermarket.

This concept identifies the pattern of frequent

purchase items by customers. This analysis can
help to promote deals, offers, sale by the
companies and data mining techniques helps to
achieve this analysis task. Example:
DATA MINING

Data mining concepts are in use for Sales and

marketing to provide better customer service, to
improve cross-selling opportunities, to increase
direct mail response rates.

Customer Retention in the form of pattern

identification and prediction of likely defections
is possible by Data mining.

Risk Assessment and Fraud area also use

the data-mining concept for identifying
inappropriate or unusual behavior etc.

 Education: For analyzing the education sector,

data mining uses Educational Data Mining
(EDM) method.

This method generates patterns that can be used

both by learners and educators. By using data
DATA MINING
mining EDM we can perform some educational
task:

 Predicting students admission in

higher education
 Predicting students profiling
 Predicting student performance
 Teachers teaching performance
 Curriculum development
 Predicting student placement opportunities

 Research: A data mining technique can perform

predictions, classification, clustering,
associations, and grouping of data with
perfection in the research area.

Rules generated by data mining are unique to

find results. In most of the technical research in
data mining, we create a training model and
testing model.
DATA MINING
The training/testing model is a strategy to
measure the precision of the proposed model. It
is called Train/Test because we split the data set
into two sets: a training data set and a testing
data set.

A training data set used to design the training

model whereas testing data set is used in the
testing model. Example:

 Classification of uncertain data.

 Information-based clustering.

 Healthcare and Insurance: A Pharmaceutical

sector can examine its new deals force activity
and their outcomes to improve the focusing of
high-value physicians and figure out which
promoting activities will have the best effect
in the following upcoming months,
DATA MINING
Whereas the Insurance sector, data mining can
help to predict which customers will buy new
policies, identify behavior patterns of risky
customers and identify fraudulent behavior of
customers.

 Financial/Banking Sector: A credit card

company can leverage its vast warehouse of
customer transaction data to identify
customers most likely to be interested in a new
credit product.

 Credit card fraud detection.

 Identify ‘Loyal’ customers.
 Extraction of information related
to customers.
 Determine credit card spending by
customer groups.

Fundamentals of Data Science Unit 1
No ratings yet
Fundamentals of Data Science Unit 1
29 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
DM Module1
No ratings yet
DM Module1
15 pages
Knowledge Management UNIT-3 Notes
No ratings yet
Knowledge Management UNIT-3 Notes
17 pages
Unit I DM
No ratings yet
Unit I DM
27 pages
Whats App
No ratings yet
Whats App
23 pages
Topic 4 - Data Mining Tools and Technique
No ratings yet
Topic 4 - Data Mining Tools and Technique
22 pages
Data Mining: Tasks, Models, and Issues
No ratings yet
Data Mining: Tasks, Models, and Issues
19 pages
DWDM Unit-II Notes
No ratings yet
DWDM Unit-II Notes
29 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
24 pages
Data Mining Essentials
No ratings yet
Data Mining Essentials
13 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
DM Chapter 1
No ratings yet
DM Chapter 1
10 pages
Data Mining Mids
No ratings yet
Data Mining Mids
24 pages
FDS Unit01
No ratings yet
FDS Unit01
10 pages
Unit - I
No ratings yet
Unit - I
22 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
31 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
Data Mining for Business Insights
100% (3)
Data Mining for Business Insights
11 pages
Seminar On Data Mining Concepts and Its
No ratings yet
Seminar On Data Mining Concepts and Its
8 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
Data Mining
No ratings yet
Data Mining
9 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
15 pages
DMW Notes by Me
No ratings yet
DMW Notes by Me
45 pages
UNIT 1 - Lecture 1 - Introduction To Data Mining
No ratings yet
UNIT 1 - Lecture 1 - Introduction To Data Mining
62 pages
Datamining & Cluster Coputing
No ratings yet
Datamining & Cluster Coputing
16 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
1.1 - Data Mining
No ratings yet
1.1 - Data Mining
18 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
Chapter 1&2
No ratings yet
Chapter 1&2
91 pages
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
No ratings yet
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
31 pages
Application of Data Mining - A Survey Paper: Aarti Sharma, Rahul Sharma, Vivek Kr. Sharma, Vishal Shrivatava
No ratings yet
Application of Data Mining - A Survey Paper: Aarti Sharma, Rahul Sharma, Vivek Kr. Sharma, Vishal Shrivatava
3 pages
Notes For DMDWH - Module1
No ratings yet
Notes For DMDWH - Module1
21 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
New Note
No ratings yet
New Note
23 pages
Why Data Mining? Behavioral Data: From Lecture Notes
No ratings yet
Why Data Mining? Behavioral Data: From Lecture Notes
5 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
Data Mining: Techniques & Applications
No ratings yet
Data Mining: Techniques & Applications
38 pages
Data Mining-1
No ratings yet
Data Mining-1
7 pages
DWDM Unit-2
No ratings yet
DWDM Unit-2
13 pages
Unit 1 Datamining
No ratings yet
Unit 1 Datamining
16 pages
Unit III
No ratings yet
Unit III
101 pages
Data Mining (Module-1)
No ratings yet
Data Mining (Module-1)
14 pages
Data Warehousing & Mining Overview
No ratings yet
Data Warehousing & Mining Overview
55 pages
BDA Class1
No ratings yet
BDA Class1
33 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
IS352 - Lecture 01
No ratings yet
IS352 - Lecture 01
62 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
FoDS - Unit 1
No ratings yet
FoDS - Unit 1
7 pages
Unit 1
No ratings yet
Unit 1
59 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
No ratings yet
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
14 pages
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
No ratings yet
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
6 pages
Presentation On Data Mining
100% (1)
Presentation On Data Mining
51 pages
Unit 3
No ratings yet
Unit 3
40 pages
Mini Project
No ratings yet
Mini Project
7 pages
Presentation 2
No ratings yet
Presentation 2
10 pages
Car Project Final Report
No ratings yet
Car Project Final Report
35 pages
Toaz - Info Electronic Shop Management System PR
No ratings yet
Toaz - Info Electronic Shop Management System PR
16 pages
Unit 2 Data Warehouse
No ratings yet
Unit 2 Data Warehouse
22 pages
Project Synopsis
No ratings yet
Project Synopsis
6 pages
Syllabus - 2007-08 Final
No ratings yet
Syllabus - 2007-08 Final
17 pages
Writing Works - A Resource Handbook For Therapeutic Writing Workshops and Activities (Writing For Therapy or Personal Development) (PDFDrive) PDF
100% (3)
Writing Works - A Resource Handbook For Therapeutic Writing Workshops and Activities (Writing For Therapy or Personal Development) (PDFDrive) PDF
255 pages
RPT Form 3 2022 SMKPM
No ratings yet
RPT Form 3 2022 SMKPM
25 pages
Grammar Test: Choose The Correct Answers To Complete The Sentences
No ratings yet
Grammar Test: Choose The Correct Answers To Complete The Sentences
4 pages
Judge
No ratings yet
Judge
11 pages
5e's Lesson Plan by Angelie Gerasmio
No ratings yet
5e's Lesson Plan by Angelie Gerasmio
7 pages
Supporting Documents For Ipcr 2
No ratings yet
Supporting Documents For Ipcr 2
2 pages
PE2 Midterm2 Module New Format
No ratings yet
PE2 Midterm2 Module New Format
27 pages
Maptek Getting Started With Drillhole and Databases 2018
No ratings yet
Maptek Getting Started With Drillhole and Databases 2018
2 pages
MAINSTREAM
No ratings yet
MAINSTREAM
8 pages
Aspiring Entrepreneurs Guide
No ratings yet
Aspiring Entrepreneurs Guide
9 pages
8b - Final Reflection Paper Internship I
No ratings yet
8b - Final Reflection Paper Internship I
5 pages
Laura Gervais Assignment 2
100% (3)
Laura Gervais Assignment 2
13 pages
What Overarching Understandings Are Desired? What Are The Overarching "Essential" Questions?
No ratings yet
What Overarching Understandings Are Desired? What Are The Overarching "Essential" Questions?
3 pages
Mother To Son, Literary Devices
No ratings yet
Mother To Son, Literary Devices
3 pages
Artigue and Blomhoj
No ratings yet
Artigue and Blomhoj
14 pages
Ga Counties Gifted Contacts
No ratings yet
Ga Counties Gifted Contacts
15 pages
BHN Presentasi Giovanni C.Tato, S.Ars
No ratings yet
BHN Presentasi Giovanni C.Tato, S.Ars
32 pages
Study Plan (2020) : CSIR-NET PART-A (General Aptitude)
No ratings yet
Study Plan (2020) : CSIR-NET PART-A (General Aptitude)
17 pages
Acc 113-1 2
No ratings yet
Acc 113-1 2
211 pages
Toivanen Et Al - 2011 - Challenge-of-the-Empty-Space
No ratings yet
Toivanen Et Al - 2011 - Challenge-of-the-Empty-Space
10 pages
Customer Service ESL Worksheet
50% (2)
Customer Service ESL Worksheet
4 pages
Matias Anghileri
100% (1)
Matias Anghileri
32 pages
BFC
No ratings yet
BFC
8 pages
Q3e RW3 U0506A Student
No ratings yet
Q3e RW3 U0506A Student
5 pages
The Universal Declaration of Human Rights (Abbreviated)
No ratings yet
The Universal Declaration of Human Rights (Abbreviated)
7 pages
English Vocabulary Booster: Family
No ratings yet
English Vocabulary Booster: Family
2 pages
Q2 Grade 12 Monitoring Tool
No ratings yet
Q2 Grade 12 Monitoring Tool
23 pages
CSC388 Syllabus
No ratings yet
CSC388 Syllabus
8 pages
Preparing Effective ABET Self-Study
No ratings yet
Preparing Effective ABET Self-Study
42 pages

Unit 1 Data Mining

Uploaded by

Unit 1 Data Mining

Uploaded by

DATA MINING

The data can be structured, semi-structured or

The primary goal of data mining is to discover

This involves exploring the data using various

Data mining has a wide range of applications across

KDD(KNOWLEDGE DECISION DATABASES)

Difference between KDD and Data Mining

Parameter KDD Data Mining

Parameter KDD Data Mining

To find useful To extract useful

Data cleaning, data

Parameter KDD Data Mining

Focus is on the Data mining focus

Domain expertise is Domain expertise is

 Classification of Data mining frameworks as per the

In other words, this technique of data mining helps to

PROBLEMS, ISSUES AND CHALLENGES IN

 Interactive mining of knowledge at multiple levels of

 Incorporation of background knowledge − To guide

Background knowledge may be used to express the

 Data mining query languages and ad hoc data mining

 Presentation and visualization of data mining results

 Handling noisy or incomplete data − The data

 Pattern evaluation − The patterns discovered should

 Parallel, distributed, and incremental mining

These algorithms divide the data into partitions which is

The incremental algorithms, update databases

 Scientific Analysis: Scientific simulations are

 Sequence analysis in bioinformatics

 Intrusion Detection: A network intrusion refers

Network intrusions often involve stealing

Intrusion Detection system generates alarms for

 Detect security violations

 Business Transactions: Every business industry

The effective and in-time use of the data in a

Data mining helps to analyze these business

 Direct mail targeting

 Market Basket Analysis: Market Basket

This concept identifies the pattern of frequent

Data mining concepts are in use for Sales and

Customer Retention in the form of pattern

Risk Assessment and Fraud area also use

 Education: For analyzing the education sector,

This method generates patterns that can be used

 Predicting students admission in

 Research: A data mining technique can perform

Rules generated by data mining are unique to

A training data set used to design the training

 Classification of uncertain data.

 Healthcare and Insurance: A Pharmaceutical

 Financial/Banking Sector: A credit card

 Credit card fraud detection.

You might also like