0% found this document useful (0 votes)

10 views11 pages

Data Mining

Data mining is the process of extracting valuable information from large datasets using techniques from statistics, machine learning, and database systems. It involves various algorithms for classification, clustering, regression, and association rule learning, among others, to identify patterns and trends. The document outlines the steps of knowledge discovery in data mining, the types of data that can be mined, and emphasizes the importance of data mining in enhancing decision-making and competitiveness in various industries.

Uploaded by

helly251102

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views11 pages

Data Mining

Uploaded by

helly251102

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Data Mining

Concepts and Techniques

Helly Sunil Shah,Prof.Mayank Dewani
1.Student,B.E.Computer Engineering,Sal College of Engineering ,Ahmedabad,Gujarat,India

2.Assistant Professor,Department of Information Technology,Sal College of Engineering,Ahmedabad,Gujarat ,India.

Introduction to Data Mining

Data mining is the process of extracting useful information from large sets of
data. It involves using various techniques from statistics, machine learning,
and database systems to identify patterns, relationships, and trends in the
data. This information can then be used to make data-driven decisions, solve
business problems, and uncover hidden insights. Applications of data mining
include customer profiling and segmentation, market basket analysis, anomaly
detection, and predictive modelling. Data mining tools and technologies are
widely used in various industries, including finance, healthcare, retail, and
telecommunications.

In general terms, “Mining” is the process of extraction of some valuable

material from the earth e.g. coal mining, diamond mining, etc. In the context
of computer science, “Data Mining” can be referred to as knowledge mining
from data, knowledge extraction, data/pattern analysis, data archaeology,
and data dredging. It is basically the process carried out for the extraction of
useful information from a bulk of data or data warehouses. One can see that
the term itself is a little confusing. In the case of coal or diamond mining, the
result of the extraction process is coal or diamond. But in the case of Data
Mining, the result of the extraction process is not data!! Instead, data mining
results are the patterns and knowledge that we gain at the end of the
extraction process. In that sense, we can think of Data Mining as a step in the
process of Knowledge Discovery or Knowledge Extraction.

DATA MINING ALGORITHMS:

An algorithm in data mining (or machine learning) is a set of heuristics and

calculations that creates a model from data. To create a model, the algorithm
first analyses the data you provide, looking for specific types of patterns or
trends. The algorithm uses the results of this analysis over many iterations to
find the optimal parameters for creating the mining model. These parameters
are then applied across the entire data set to extract actionable patterns and
detailed statistics.
The mining model that an algorithm creates from your data can take various
forms, including:

 A set of clusters that describe how the cases in a dataset are related.
 A decision tree that predicts an outcome, and describes how different
criteria affect that outcome.
 A mathematical model that forecasts sales.
 A set of rules that describe how products are grouped together in a
transaction, and the probabilities that products are purchased together.

The algorithms provided in SQL Server Data Mining are the most popular, well-
researched methods of deriving patterns from data. To take one example, K-
means clustering is one of the oldest clustering algorithms and is available
widely in many different tools and with many different implementations and
options. However, the particular implementation of K-means clustering used in
SQL Server Data Mining was developed by Microsoft Research and then
optimized for performance with SQL Server Analysis Services. All of the
Microsoft data mining algorithms can be extensively customized and are fully
programmable, using the provided APIs. You can also automate the creation,
training, and retraining of models by using the data mining components in
Integration Services.

Key Data Mining Algorithm Types and Examples:

 Classification:
Predicting the category or class of a data instance.
 Decision Trees: Use a tree-like structure to make decisions, Learn
Microsoft describes them.
 Naïve Bayes: Applies Bayes' theorem to classify data based on probabilities.
 Support Vector Machines (SVM): Finds a hyperplane to separate data into
different classes, Wiley Online Library explains.
 k-Nearest Neighbours (KNN): Classifies data based on its proximity to other data
points.
KNOWLEDGE DISCOVERY IN DATA MINING
The Knowledge Discovery in Databases process comprises of a few steps leading
from raw data collections to some form of new knowledge. The iterative
process consists of the following steps:

Data cleaning: also known as data cleansing, it is a phase in which noise

data and irrelevant data are removed from the collection.
Data integration: at this stage, multiple data sources, often
heterogeneous, may be combined in a common source.
Data selection: at this step, the data relevant to the analysis is decided on
and retrieved from the data collection.
Data transformation: also known as data consolidation, it is a phase in
which the selected data is transformed into forms appropriate for the
mining procedure.
Data mining: it is the crucial step in which clever techniques are applied to
extract patterns potentially useful.
Pattern evaluation: in this step, strictly interesting patterns representing
knowledge are identified based on given measures.
Knowledge representation: is the final phase in which the discovered
knowledge is visually represented to the user. This essential step uses visualization
techniques to help users understand and interpret the data mining results.
What kind of Data can be mined?
Flat files: Flat files are actually the most common data source for data
mining algorithms, especially at the research level. Flat files are simple
data files in text or binary format with a structure known by the data
mining algorithm to be applied. The data in these files can be
transactions, time-series data, scientific measurements, etc.

Data Warehouses: A data warehouse as a storehouse, is a repository

of data collected from multiple data sources (often heterogeneous) and
is intended to be used as a whole under the same unified schema. A data
warehouse gives the option to analyze data from different sources under
the same roof. Let us suppose that OurVideoStore becomes a franchise in
North America. Many video stores belonging to OurVideoStore company
may have different databases and different structures. If the executive of
the company wants to access the data from all stores for strategic
decision- making, future direction, marketing, etc., it would be more
appropriate to store all the data in one site with a homogeneous structure
that allows interactive analysis. In other words, data from the different
stores would be loaded, cleaned, transformed and integrated together. To
facilitate decision-making and multi-dimensional views, data warehouses
are usually modeled by a multi- dimensional data structure.An example of
a three-dimensional subset of a data cube structure used for
OurVideoStore data warehouse.
Multimedia Databases: Multimedia databases include video, images,
audio and text media. They can be stored on extended object-relational
or object-oriented databases, or simply on a file system. Multimedia is
characterized by its high dimensionality, which makes data mining even
more challenging. Data mining from multimedia repositories may require
computer vision, computer graphics, image interpretation, and natural
language processing methodologies.

Spatial Databases: Spatial databases are databases that, in addition to usual

data, store geographical information like maps, and global or regional
positioning. Such spatial databases present new challenges to data mining
algorithms.

CONCEPTS OF DATA MINING

Data mining is a technique for identifying patterns in large amounts of data and
information. Databases, data centers, the internet, and other data storage
formats; or data that is dynamically streaming into the network are examples of
data sources.
1 Data Preparation and Pre-processing:
 Data Cleaning:
This involves handling missing values, removing duplicates,
and correcting inconsistencies in the data.
 Data Integration:
Combining data from multiple sources into a single dataset.
 Data Transformation:
Converting data into a suitable format for analysis, such as
scaling or normalization.
 Data Reduction:
Reducing the size of the dataset while preserving its essential
information, often through techniques like sampling or
dimensionality reduction

1.
2. Model Building and Evaluation:
 Model Design: Choosing and implementing appropriate
algorithms for data analysis.
 Model Testing: Evaluating the performance of the model on a
separate dataset to ensure its accuracy and reliability.
 Model Evaluation: Assessing the model's performance based
on specific metrics.

3. Data Mining Techniques:

 Machine Learning: Using algorithms to learn from data and
make predictions or decisions.
 Statistical Analysis: Employing statistical methods to analyze
data and identify patterns.
 Database Management: Handling the storage and retrieval of
data used in data mining.
 Artificial Intelligence: Utilizing AI techniques, such as neural
networks, to analyze data.
 Data Visualization: Presenting data in a visual format to
facilitate understanding and analysis

4. Data Analysis and Pattern Discovery:

 Classification: Assigning data points to predefined categories
based on their characteristics.
 Clustering: Grouping similar data points together based on
their characteristics.
 Regression: Predicting a continuous value based on one or
more input variables.
 Association Rule Mining: Discovering relationships between
variables, often used in market basket analysis.
DATA MINING TECHNIQUES

Data mining techniques are methods used to discover patterns,

relationships, or useful insights from large volumes of data. Here are
some of the most commonly used data mining techniques:

1. Classification

 Purpose: Assign data into predefined categories or classes.

 Example Algorithms: Decision Trees, Random Forest, Support
Vector Machines (SVM), Naive Bayes.
 Use Case: Email spam detection, credit risk evaluation.

2. Clustering

 Purpose: Group similar data points into clusters without

predefined labels.
 Example Algorithms: K-Means, DBSCAN, Hierarchical
Clustering.
 Use Case: Customer segmentation, image compression.

3. Regression

 Purpose: Predict a continuous numeric value based on input

variables.
 Example Algorithms: Linear Regression, Polynomial Regression,
Ridge Regression.
 Use Case: Predicting housing prices, stock market forecasting.
4. Association Rule Learning

 Purpose: Find interesting relationships (associations) between

variables in large databases.
 Example Algorithms: Apriorism, Eclat.
 Use Case: Market basket analysis (e.g., “Customers who buy X
also buy Y”).

5. Anomaly Detection (Outlier Detection)

 Purpose: Identify rare items, events, or observations that differ

significantly from the majority of the data.
 Example Algorithms: Isolation Forest, One-Class SVM, k-NN
based methods.
 Use Case: Fraud detection, network security.

6. Dimensionality Reduction

 Purpose: Reduce the number of input variables in a dataset.

 Example Techniques: Principal Component Analysis (PCA), t-
SNE, LDA.
 Use Case: Data visualization, improving performance in
machine learning models.

7. Prediction

 Purpose: Estimate future outcomes based on historical data.

 Tools Used: A combination of classification and regression.
 Use Case: Sales forecasting, demand prediction.
Conclusion:

Data mining is a powerful tool for manufacturing companies,

enabling them to discover hidden patterns and trends in their
data, leading to better decision-making and process
optimization. By leveraging data mining techniques,
companies can improve efficiency, reduce costs, and enhance
their competitiveness in an increasingly data-driven market
environment. Hence, concept and techniques in data mining
are explained.

References

Data mining concepts and techniques,

"Data Mining: Concepts and Techniques" by Jiawei Han,

Micheline Kamber, and Jian Pei is a widely recommended
resource. This book provides a thorough understanding of data
mining principles and techniques, including data warehousing,
data mining algorithms, and knowledge discovery. For practical
applications, "Data Mining Techniques" by M.J. Berry and G.
Linoff offers guidance on using data mining techniques in
marketing and sales.

Data Mining-1
No ratings yet
Data Mining-1
7 pages
DWH Unit 3
No ratings yet
DWH Unit 3
7 pages
Unit I DM
No ratings yet
Unit I DM
27 pages
Unit - I
No ratings yet
Unit - I
22 pages
DMWH M1
No ratings yet
DMWH M1
25 pages
Data Mining e Resources
No ratings yet
Data Mining e Resources
98 pages
Unit 1 Datamining For Business Intelligence
No ratings yet
Unit 1 Datamining For Business Intelligence
101 pages
Data Mining for Business Insights
100% (1)
Data Mining for Business Insights
39 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
Chapter 1 (Introduction)
No ratings yet
Chapter 1 (Introduction)
17 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
Data Mining and Its Techniques: A Review Paper: Maria Shoukat (MS Student)
No ratings yet
Data Mining and Its Techniques: A Review Paper: Maria Shoukat (MS Student)
7 pages
Unit 3 DWM Notes
No ratings yet
Unit 3 DWM Notes
17 pages
Data Mining Notes
No ratings yet
Data Mining Notes
21 pages
Introduction To Data Mining: - Chapter 3
No ratings yet
Introduction To Data Mining: - Chapter 3
39 pages
8 Data Mining and Warehousing
No ratings yet
8 Data Mining and Warehousing
171 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
DM Chapter 1
No ratings yet
DM Chapter 1
10 pages
DWDM Unit3
No ratings yet
DWDM Unit3
15 pages
DW and DM Notes
No ratings yet
DW and DM Notes
89 pages
Data Science Module 1 Notes
No ratings yet
Data Science Module 1 Notes
16 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
Module1 DataMining Ktustudents - in
No ratings yet
Module1 DataMining Ktustudents - in
24 pages
Module 4
No ratings yet
Module 4
54 pages
Data Mining Notes
No ratings yet
Data Mining Notes
9 pages
A Conceptual Overview of Data Mining: B.N. Lakshmi., G.H. Raghunandhan
No ratings yet
A Conceptual Overview of Data Mining: B.N. Lakshmi., G.H. Raghunandhan
6 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
Intro Data Mining
No ratings yet
Intro Data Mining
51 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
01 Intro
No ratings yet
01 Intro
45 pages
Unit III
No ratings yet
Unit III
101 pages
Fundamentals of Data Science Unit 1
No ratings yet
Fundamentals of Data Science Unit 1
29 pages
Chapter-1 - Introduction To Data Mining
No ratings yet
Chapter-1 - Introduction To Data Mining
10 pages
DM Notes
No ratings yet
DM Notes
26 pages
Chapter 1&2
No ratings yet
Chapter 1&2
91 pages
: - -: What The Data Mining?: عوضوملا
No ratings yet
: - -: What The Data Mining?: عوضوملا
6 pages
Seminar On Data Mining Concepts and Its
No ratings yet
Seminar On Data Mining Concepts and Its
8 pages
Data Mining Cognate
No ratings yet
Data Mining Cognate
23 pages
Unit - 4 Introduction To Data Mining
No ratings yet
Unit - 4 Introduction To Data Mining
71 pages
Data Mining
No ratings yet
Data Mining
44 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
Data Mining Essentials
No ratings yet
Data Mining Essentials
13 pages
Data Mining: Applications and Techniques
No ratings yet
Data Mining: Applications and Techniques
60 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
15 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
Data Minng
No ratings yet
Data Minng
20 pages
DM Module1
No ratings yet
DM Module1
15 pages
Chapter 7 Introduction To Knowledge Discovery in Databases
No ratings yet
Chapter 7 Introduction To Knowledge Discovery in Databases
15 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Unit III DWDM
No ratings yet
Unit III DWDM
113 pages
Unit 1
No ratings yet
Unit 1
11 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
73 pages
21SE204-B DATA MINING - S2 M.Tech: Prepared By, Prince V Jose Ap, Cse Saintgits College of Engg
No ratings yet
21SE204-B DATA MINING - S2 M.Tech: Prepared By, Prince V Jose Ap, Cse Saintgits College of Engg
31 pages
Data Mining and Warehousing-1
No ratings yet
Data Mining and Warehousing-1
43 pages
Data Warehousing & Mining Overview
No ratings yet
Data Warehousing & Mining Overview
55 pages
Unit 3
No ratings yet
Unit 3
23 pages
EO Extension in R12
100% (2)
EO Extension in R12
10 pages
Blended DLP G10 Week3 Lesson 2
No ratings yet
Blended DLP G10 Week3 Lesson 2
10 pages
L-3 CSS From Basic To Advance
No ratings yet
L-3 CSS From Basic To Advance
18 pages
AP 550 Asphalt Paver Sell Sheet MSS-1172-02-EN
No ratings yet
AP 550 Asphalt Paver Sell Sheet MSS-1172-02-EN
2 pages
Automatic Light Reflector
67% (3)
Automatic Light Reflector
6 pages
VDSL Tutorial
No ratings yet
VDSL Tutorial
10 pages
TD 1
No ratings yet
TD 1
2 pages
Semana 06 O - Proportioning Concrete Mixtures - Metha Chapter 9
No ratings yet
Semana 06 O - Proportioning Concrete Mixtures - Metha Chapter 9
10 pages
Lec 13
No ratings yet
Lec 13
18 pages
LAB 1 Installing Servers
No ratings yet
LAB 1 Installing Servers
7 pages
Q Data Based - 5
No ratings yet
Q Data Based - 5
2 pages
Lead Mechanical Design Engineer in Atlanta GA Resume Tatiana Laguna
No ratings yet
Lead Mechanical Design Engineer in Atlanta GA Resume Tatiana Laguna
2 pages
Denso Cat 09 2024 1
100% (1)
Denso Cat 09 2024 1
23 pages
Friction
No ratings yet
Friction
31 pages
Winters Promise Quilt Pattern
No ratings yet
Winters Promise Quilt Pattern
7 pages
The X3: Dealer Specification Guide From August 2019 Production
No ratings yet
The X3: Dealer Specification Guide From August 2019 Production
12 pages
Multiplying and Dividing by Powers of 10
No ratings yet
Multiplying and Dividing by Powers of 10
6 pages
Appendix 4 - SPECIFICATION FOR STRUCTURAL STEEL MATERIAL FOR OFFSHORE STRUCTURES
100% (3)
Appendix 4 - SPECIFICATION FOR STRUCTURAL STEEL MATERIAL FOR OFFSHORE STRUCTURES
21 pages
NPSH Calculation
100% (5)
NPSH Calculation
14 pages
Ericsson AXE 810: Switch (ROTD)
No ratings yet
Ericsson AXE 810: Switch (ROTD)
4 pages
Modeling and Simulation of High-Pressure Urea Synthesis Loop
No ratings yet
Modeling and Simulation of High-Pressure Urea Synthesis Loop
10 pages
SM SSM DB Uk 003
No ratings yet
SM SSM DB Uk 003
4 pages
Transformed Exponential Model for Step-Stress Testing
No ratings yet
Transformed Exponential Model for Step-Stress Testing
8 pages
Eurodist System: Laboratory Distillation Plants Astm D 2892 and D 5236
No ratings yet
Eurodist System: Laboratory Distillation Plants Astm D 2892 and D 5236
24 pages
Introduction To The Pythagorean Tarot
100% (1)
Introduction To The Pythagorean Tarot
8 pages
PDC - Vortex - Xceed - Kuwait - Cs - ROP DATA PDF
No ratings yet
PDC - Vortex - Xceed - Kuwait - Cs - ROP DATA PDF
2 pages
SOT-23 Plastic-Encapsulate Transistors: Jiangsu Changjiang Electronics Technology Co., LTD
No ratings yet
SOT-23 Plastic-Encapsulate Transistors: Jiangsu Changjiang Electronics Technology Co., LTD
2 pages
Engineering Students' Grinding Lab
No ratings yet
Engineering Students' Grinding Lab
9 pages
Understanding Quadrilaterals
No ratings yet
Understanding Quadrilaterals
2 pages
Sand and Gravel For Se As Filtration Medium - Specification: Indian Standard
No ratings yet
Sand and Gravel For Se As Filtration Medium - Specification: Indian Standard
12 pages

Data Mining

Uploaded by

Data Mining

Uploaded by

Data Mining

Concepts and Techniques

2.Assistant Professor,Department of Information Technology,Sal College of Engineering,Ahmedabad,Gujarat ,India.

In general terms, “Mining” is the process of extraction of some valuable

DATA MINING ALGORITHMS:

An algorithm in data mining (or machine learning) is a set of heuristics and

Key Data Mining Algorithm Types and Examples:

Data cleaning: also known as data cleansing, it is a phase in which noise

Data Warehouses: A data warehouse as a storehouse, is a repository

Spatial Databases: Spatial databases are databases that, in addition to usual

CONCEPTS OF DATA MINING

3. Data Mining Techniques:

4. Data Analysis and Pattern Discovery:

Data mining techniques are methods used to discover patterns,

 Purpose: Assign data into predefined categories or classes.

 Purpose: Group similar data points into clusters without

 Purpose: Predict a continuous numeric value based on input

 Purpose: Find interesting relationships (associations) between

5. Anomaly Detection (Outlier Detection)

 Purpose: Identify rare items, events, or observations that differ

 Purpose: Reduce the number of input variables in a dataset.

 Purpose: Estimate future outcomes based on historical data.

Data mining is a powerful tool for manufacturing companies,

Data mining concepts and techniques,

"Data Mining: Concepts and Techniques" by Jiawei Han,

You might also like