0% found this document useful (0 votes)

17 views33 pages

Data Mining - Lecture 1

Uploaded by

hendymostafa256

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views33 pages

Data Mining - Lecture 1

Uploaded by

hendymostafa256

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Data Mining and Business Intelligence

Overview

Introduction Technologies

Applications

By
Dr. Nora Shoaip
Lecture1

Damanhour University
Faculty of Computers & Information Sciences
Department of Information Systems

2024 - 2025
Introduction
• Why Data Mining
• What is Data Mining
• Data Mining Applications
• Categories of Mining Techniques
Why Data Mining?
 The era of Explosive Growth of Data: in the petabytes!
 Automated data collection and availability: tools, database systems, Web,
computerized society
 Major sources of abundant data
 Business: Web, transactions, stocks, …
 Science: Remote sensing, bioinformatics, …
 Society and everyone: news, digital cameras, social feeds
 The ability to economically store and manage petabytes of data online
 The Internet and computing Grid that makes all these archives universally accessible
 Linear growth of data management tasks with data volumes
 Massive data volumes, but still little insight!
 Solution! Data mining—The automated analysis of massive data sets

3
What is Data Mining?
Data mining (knowledge discovery from data)
o Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge
from huge amount of data
o Data mining: a misnomer?
• Alternative names
o Knowledge discovery (mining) in databases (KDD),
knowledge extraction, data/pattern analysis, information
harvesting, business intelligence, etc.
• Is everything “data mining”?
o Simple search and query processing
o (Deductive) expert systems

4
Knowledge Discovery Process
Selection:
• Finding data relevant to
the task
Processing:
• Cleaning and putting
data in format suitable
for mining
Transformation
• Performing summaries,
aggregations or
consolidation
Data Mining
• Applying the data
mining algorithms to
extract knowledge
Evaluation
• Locating useful
knowledge

5
What Kinds of Data Can Be Mined?
 Database-oriented data sets and applications
 Relational database, data warehouse, transactional database
 Advanced data sets and advanced applications
 Data streams and sensor data
 Time-series data, temporal data, sequence data (incl. bio-sequences)
 Structure data, graphs, social networks and multi-linked data
 Object-relational databases
 Heterogeneous databases and legacy databases
 Spatial data and spatiotemporal data
 Multimedia database
 Text databases
 The World-Wide Web

6
Data Mining Applications

 Understanding Customer Behavior

o Market basket analysis.
o Market segmentation.
o Targeted Advertisement.
o Market Forecasting.
o Recommender Systems

7
Data Mining Applications cont…

 Social Networks Mining

o Community detection.
o Friends recommendation.
o Trend Analysis.
o Event detection.
o Personality prediction.

8
Data Mining Applications cont…

 Web and Text Mining

o Web usage mining.
o Web structure mining.
o Search engines.
o Email categorization.
o Fact checking.

9
Categories of Mining Techniques

 Descriptive Data Mining.

 Predictive Data Mining.

10
Frequent Patterns Mining

 Descriptive data mining technique.

 Finds commonly occurring patterns in data.
What items are frequently purchased together in your supermarket basket?
 Applied on:
 Transactional data (Market Basket Analysis)
 Sequential Data
 Graph Data

11
Clustering

 Descriptive data mining.

 Unsupervised learning.
 Divide data into groups.
 Applications:
 Market segmentation
 community detection.

12
Classification
 Predictive data mining.
 Supervised learning.
Construct models (functions) based on some training examples
 Describe and distinguish classes or concepts for future prediction
 e.g., classify countries based on (climate),
 or classify cars based on (gas mileage)
 Predict some unknown class labels

Typical methods
 Decision trees, naïve Bayesian classification, support vector machines,
neural networks, rule-based classification, pattern-based classification, …

13
Know Your Data
• Data Objects & Attribute Types
• Basic Statistical Descriptions of
Data
Objects and Attributes

 A data object represents an entity

 Also sample, example, instance, data point, or object (in a DB : Data Tuple)
 e.g. customers, students, patients, books
 An attribute is a data field, representing a characteristic or feature of a data
object
 Also noun attribute, dimension, feature, and variable (DB and DM, DWs, ML,
Statistics)
 e.g. name, age, salary, gender, grade, …
 Attribute (feature) vector  A set of attributes that describe an object

15
Attribute Types:Nominal Attributes

 Symbol or names of things

 Each value represents category, code, or state
 also referred to as categorical
 e.g. hair color, marital status, customer ID
 Possible to be represented as numbers (coding)
 Qualitative

16
Attribute Types:Binary Attributes
 Nominal with only two values representing two states or
categories: 0 or 1 (absent or present)
 Also Boolean (true or false)
 Qualitative
 Symmetric: both states are equally valuable and have the same
weight
 e.g. gender
 Asymmetric: states are not equally important
 e.g. medical test outcomes

17
Attribute Types:Ordinal Attributes

 Qualitative
 Values have a meaningful order or ranking, but magnitude
between successive values is not known
 e.g. professional rank, grade, customer satisfaction
 Useful for data reduction of numerical attributes

18
Attribute Types:Numeric Attributes

 Quantitative
 Interval-scaled: measured on a scale of equal-size units
 e.g. temperature, year
 Do not have a true zero point
 Not possible to be expressed as multiples
 Ratio-scaled: have a true zero point
 A value can be expressed as a multiple of another
 e.g. years of experience, weight, salary

19
Discrete vs. Continuous Attributes

 Discrete Attribute: has a finite or countably infinite set of

values, integers or otherwise
 e.g. hair color, smoker, medical test, Customer_ID
 Customer_ID is countably infinite  infinite values but
one-to-one correspondence with natural numbers
 If an attribute is not discrete, it’s continuous
 e.g. height, weight, age

20
Outline
 Data Objects & Attribute Types
• What is an Object?
• What is an Attribute?
• Attribute Types
• Continuous vs. Discrete
 Basic Statistical Descriptions of Data
• Measuring central tendency
• Measuring Data dispersion
• Basic Graphic displays
 Measuring Data similarity & dissimilarity
• Data matrix & dissimilarity matrix
• Proximity Measures for( Nominal- Binary) attributes
• Dissimilarity of Numerical Data

21
Measuring Central Tendency

22
Measuring Central Tendency

 Median: middle value in set of ordered values

 N is odd  median is middle value of ordered set
 N is even  median is not unique  average of two middlemost
values
 Expensive to compute for large # of observations
 Mode: value that occurs most frequently in the attribute values
 Works for both qualitative and quantitative attributes
 Data can be unimodal, bimodal, or trimodal – no mode?

23
Measuring Central Tendency
Example
 Salary (in thousands of dollars), shown in increasing
order: 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110
 Mean = ?
 Median = ?
 Mode = ?

24
Measuring Central
Tendency
Example
Salary (in thousands of dollars), shown in
increasing order: 30, 36, 47, 50, 52, 52, 56, 60,
63, 70, 70, 110

• Mean = 58,000
• Median = 54,000
• Mode = 52,000 and 70,000 – bimodal

25
Measuring dispersion of Data

26
Measuring dispersion
of Data

27
Measuring dispersion
of Data
 Five-Number Summary:
 Median (Q2), quartiles Q1 and Q3, &
smallest and largest individual
observations – in order
 Boxplots: visualization technique for the five-
number summary
 Whiskers terminate at min & max OR the
most extreme observations within
1.5 × IQR of the quartiles – with
remainder points (outliers) plotted
individually
28
Ex:
Suppose that a hospital tested the age and body fat data for 18
randomly selected adults with the following results:
•Calculate the mean, median, and standard deviation of age
and %fat.
•Draw the boxplots for age and %fat.
•Calculate the correlation coefficient. Are these two attributes
positively or negatively correlated? Compute their covariance.
Age 23 23 27 27 39 41 47 49 50 52 54 54 56 57 58 58 60 61
%fat 9.5 26.5 7.8 17.8 31.4 25.9 27.4 27.2 31.2 34.6 42.5 28.8 33.4 30.2 34.1 32.9 41.2 35.7

29
Solution

Age 23 23 27 27 39 41 47 49 50 52 54 54 56 57 58 58 60 61
%fat 9.5 26.5 7.8 17.8 31.4 25.9 27.4 27.2 31.2 34.6 42.5 28.8 33.4 30.2 34.1 32.9 41.2 35.7

30
Solution %fat
7.8
9.5
Draw the boxplots for age and %fat. 17.8
For Age 25.9
26.5
 Q1=39, median= 51, Q3=57, min=23, max=61 27.2
 IQR= 57-39= 18, 1.5 IQR= 27 27.4
28.8
 newMin= 39-27= 12, newMax= 57+27= 84 30.2
31.2
For Fat 31.4
 Q1=26.5, median= 30.7, Q3=34.1, min=7.8, max=42.5 32.9
33.4
 IQR= 34.1-26.5= 7.6, 1.5 IQR= 11.4 34.1
34.6
 newMin= 26.5-11.4= 15.1, 35.7
 newMax= 34.1+11.4= 45.5 41.2
42.5

Age 23 23 27 27 39 41 47 49 50 52 54 54 56 57 58 58 60 61
%fat 9.5 26.5 7.8 17.8 31.4 25.9 27.4 27.2 31.2 34.6 42.5 28.8 33.4 30.2 34.1 32.9 41.2 35.7 25
31
Visual Representations
of Data Distributions
 Histograms
 Scatter Plots: each pair of values is treated
as a pair of coordinates and plotted as
points in plane
 X and Y are correlated if one attribute
implies the other
 positive, negative, or null
(uncorrelated)
 For more attributes, we use a scatter
plot matrix
32
Visual Representations
of Data Distributions

Uncorrelated data

Mining
No ratings yet
Mining
129 pages
DM Unit-1-1
No ratings yet
DM Unit-1-1
56 pages
Getting To Know Your Data
No ratings yet
Getting To Know Your Data
78 pages
Session 1 - Getting To Know Data
No ratings yet
Session 1 - Getting To Know Data
62 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
DMDW Module2-Chapter 2
No ratings yet
DMDW Module2-Chapter 2
67 pages
Lect 3
No ratings yet
Lect 3
51 pages
02 Data
No ratings yet
02 Data
62 pages
Data Mining (DM) : Lecture 3: Know Your Data
No ratings yet
Data Mining (DM) : Lecture 3: Know Your Data
53 pages
Data Analysis & Visualization Guide
No ratings yet
Data Analysis & Visualization Guide
63 pages
02 Data
No ratings yet
02 Data
24 pages
02 Data
No ratings yet
02 Data
41 pages
02 KnowYourData
No ratings yet
02 KnowYourData
44 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
36 pages
Transportation Data Mining: Chapter 2. Getting To Know Your Data
No ratings yet
Transportation Data Mining: Chapter 2. Getting To Know Your Data
77 pages
02 Data
No ratings yet
02 Data
65 pages
02 Data
No ratings yet
02 Data
47 pages
02know Your Data Lecture2 3
No ratings yet
02know Your Data Lecture2 3
53 pages
02data (Compatibility Mode)
No ratings yet
02data (Compatibility Mode)
11 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
02know Your Data-Lecture2-3
No ratings yet
02know Your Data-Lecture2-3
53 pages
02 Data
No ratings yet
02 Data
65 pages
02 Kinds of Data
No ratings yet
02 Kinds of Data
41 pages
Data ch2
No ratings yet
Data ch2
16 pages
02data Edited v2
No ratings yet
02data Edited v2
43 pages
02 Data
No ratings yet
02 Data
35 pages
2 1 Data
No ratings yet
2 1 Data
22 pages
Unit 1b
No ratings yet
Unit 1b
69 pages
Unit1 Statistics
No ratings yet
Unit1 Statistics
60 pages
Data Type, Data Chart, Descriptive Statistics
No ratings yet
Data Type, Data Chart, Descriptive Statistics
65 pages
02data DMDW
No ratings yet
02data DMDW
40 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
65 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
65 pages
DM-Knowing Your Data
No ratings yet
DM-Knowing Your Data
56 pages
Chapter 2
No ratings yet
Chapter 2
65 pages
02 Data
No ratings yet
02 Data
64 pages
Data Warehousing and Data Mining: DR Seema Agarwal
No ratings yet
Data Warehousing and Data Mining: DR Seema Agarwal
72 pages
CH 2
No ratings yet
CH 2
68 pages
Chapter 2 - Tagged
No ratings yet
Chapter 2 - Tagged
66 pages
1 L2 Intro DAM
No ratings yet
1 L2 Intro DAM
27 pages
Data Mining for CS Students
No ratings yet
Data Mining for CS Students
406 pages
IDS Unit 2 Additional Topics
No ratings yet
IDS Unit 2 Additional Topics
15 pages
Unit 2 Data Preprocessing For Students
No ratings yet
Unit 2 Data Preprocessing For Students
169 pages
VIPDMTheory Chapter 2
No ratings yet
VIPDMTheory Chapter 2
56 pages
Lecture 2
No ratings yet
Lecture 2
62 pages
Data Mining: Data Exploration: - Chapter 6
No ratings yet
Data Mining: Data Exploration: - Chapter 6
56 pages
02 Data
No ratings yet
02 Data
66 pages
Data Analysts-1
No ratings yet
Data Analysts-1
65 pages
DWDM LS2 Fall 24 25
No ratings yet
DWDM LS2 Fall 24 25
42 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
46 pages
Datamining 1class
No ratings yet
Datamining 1class
76 pages
Module 1
No ratings yet
Module 1
64 pages
Lec.02 Getting To Know Your Data
No ratings yet
Lec.02 Getting To Know Your Data
62 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
54 pages
Datamining-Lect1 2
No ratings yet
Datamining-Lect1 2
44 pages
III-IT-Data Mining Unit 1-Session 3
No ratings yet
III-IT-Data Mining Unit 1-Session 3
21 pages
Data Mining:: Concepts and Techniques
100% (1)
Data Mining:: Concepts and Techniques
63 pages
Construction Blueprint Details
100% (1)
Construction Blueprint Details
2 pages
Diversity Models and Dimensions Guide
No ratings yet
Diversity Models and Dimensions Guide
4 pages
Exercise Workbook2 Basic
No ratings yet
Exercise Workbook2 Basic
90 pages
1.1 General: Chapter - 1
No ratings yet
1.1 General: Chapter - 1
10 pages
Checking Understanding
No ratings yet
Checking Understanding
9 pages
Diagramas GDZ-50E
No ratings yet
Diagramas GDZ-50E
4 pages
Parallel Merge in Syntax Theory
No ratings yet
Parallel Merge in Syntax Theory
23 pages
5.1 Chemical Formulae, Equations, Calculations (1C) QP Part 2
No ratings yet
5.1 Chemical Formulae, Equations, Calculations (1C) QP Part 2
12 pages
MT Test 1 QP
No ratings yet
MT Test 1 QP
2 pages
Assessing Wildfire Vulnerability of Vegetated Serpentine Soils in The Balkan Peninsula
No ratings yet
Assessing Wildfire Vulnerability of Vegetated Serpentine Soils in The Balkan Peninsula
13 pages
TRS501 Vocabulary List
No ratings yet
TRS501 Vocabulary List
9 pages
IPDC 1 English Question Bank
No ratings yet
IPDC 1 English Question Bank
24 pages
Organisng 25
No ratings yet
Organisng 25
2 pages
Arabic Greetings for Beginners
No ratings yet
Arabic Greetings for Beginners
4 pages
Analysis of Air-Conditioning Processes Question Only
No ratings yet
Analysis of Air-Conditioning Processes Question Only
4 pages
Implementation of Error Detection Mechanism Using NetSim
No ratings yet
Implementation of Error Detection Mechanism Using NetSim
3 pages
Alloy Steel Tubes Material Data
No ratings yet
Alloy Steel Tubes Material Data
3 pages
5.1 Logic Statements and Quantifiers
No ratings yet
5.1 Logic Statements and Quantifiers
16 pages
General Engineering PDF
No ratings yet
General Engineering PDF
12 pages
IIUI Schools Examination Policy Revised (Sep 2022)
No ratings yet
IIUI Schools Examination Policy Revised (Sep 2022)
5 pages
Java Applet and Factorial Guide
No ratings yet
Java Applet and Factorial Guide
6 pages
Science Lesson Plan 5
No ratings yet
Science Lesson Plan 5
2 pages
Format Messtechnik GMBH
No ratings yet
Format Messtechnik GMBH
44 pages
T2FD Antenna: History & Design
100% (1)
T2FD Antenna: History & Design
4 pages
Asm Brief - POM A2
No ratings yet
Asm Brief - POM A2
5 pages
Dsoc202 Social Stratification English PDF
No ratings yet
Dsoc202 Social Stratification English PDF
315 pages
Speeduino Manual
No ratings yet
Speeduino Manual
131 pages
Lea Strength: Instruments
No ratings yet
Lea Strength: Instruments
3 pages
Practical Research 1: Quarter 3 - Module 13: Literature Review
100% (2)
Practical Research 1: Quarter 3 - Module 13: Literature Review
26 pages
Estmt - 2024 07 17
No ratings yet
Estmt - 2024 07 17
6 pages

Data Mining - Lecture 1

Uploaded by

Data Mining - Lecture 1

Uploaded by

Data Mining and Business Intelligence

 Understanding Customer Behavior

 Social Networks Mining

 Web and Text Mining

 Descriptive Data Mining.

 Predictive Data Mining.

 Descriptive data mining technique.

 Descriptive data mining.

 A data object represents an entity

 Symbol or names of things

 Discrete Attribute: has a finite or countably infinite set of

 Median: middle value in set of ordered values

You might also like