KEMBAR78
Data Mining Notes | PDF | Cluster Analysis | Principal Component Analysis
0% found this document useful (0 votes)
4 views5 pages

Data Mining Notes

The document covers the fundamentals of data mining and knowledge discovery, differentiating between the two processes and outlining key functionalities such as classification and clustering. It also discusses data preprocessing techniques, various data mining models, clustering methods, and neural networks, highlighting their applications, advantages, and challenges. Additionally, it emphasizes the importance of model selection, evaluation metrics, and the role of machine learning in enhancing data analysis.

Uploaded by

vtu21910
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views5 pages

Data Mining Notes

The document covers the fundamentals of data mining and knowledge discovery, differentiating between the two processes and outlining key functionalities such as classification and clustering. It also discusses data preprocessing techniques, various data mining models, clustering methods, and neural networks, highlighting their applications, advantages, and challenges. Additionally, it emphasizes the importance of model selection, evaluation metrics, and the role of machine learning in enhancing data analysis.

Uploaded by

vtu21910
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Unit 1 - Data Mining and Knowledge Discovery

1. Differentiate Data Mining and Knowledge Discovery


Data Mining: Process of discovering patterns.
Knowledge Discovery: Overall process, includes data cleaning, transformation, mining.

2. Functionalities of Data Mining (2 Examples)


- Classification: Predict categories.
- Clustering: Group similar items.

3. Interesting Pattern: A pattern that is valid, novel, useful, and understandable.

4. Predictive vs Descriptive
Predictive: Future prediction (e.g., classification).
Descriptive: Pattern discovery (e.g., clustering).

5. 10 Applications: Marketing, Fraud Detection, Stock Market, Health Care, Web Mining, Telecom,
Retail, Education, Manufacturing, Banking.

6. Machine Learning: AI technique enabling systems to learn. Types: Supervised, Unsupervised,


Reinforcement.

7. Model Selection: Choosing best model (based on accuracy, performance).

8. Overfitting: Model performs well on training data but poorly on unseen data. Evaluation Metrics:
Accuracy, F1-Score.

9. Concept Learning Goal: Learn a general concept from examples. E.g., Learning "fruit" concept
from apples, bananas.

Unit 2 - Data Preprocessing


1. Issues in Raw Data: Missing values, noise, outliers, inconsistencies.

2. Outlier Removal: Z-Score Method, IQR Method.

3. Concept Hierarchy: Organizing data into levels of abstraction. E.g., Country > State > City.

4. Dimensionality Reduction: Reduce features. Important for efficiency and avoiding overfitting.

5. Feature Extraction Examples: Image Processing, Speech Recognition.

6. Variable Selection: Filter, Wrapper, Embedded Methods.

7. Variable Ranking: Ordering features based on relevance.

8. Objectives of LDA: Maximize class separation, reduce dimensions.

9. PCA: Projects data onto principal components to reduce dimensions.

10. Factor Analysis: Identify underlying relationships among variables.

11. Cross-Validation: Evaluates model?s performance.

12. Resampling Methods: Improve accuracy by sampling data (e.g., bootstrapping).

Unit 3 - Data Mining Models

1. Regression Models Pros & Cons


Pros: Predicts continuous values. Cons: Sensitive to outliers.

2. Types of Association Rule Mining: Single-dimensional, Multi-dimensional, Quantitative.

3. Decision Tree Induction: Build tree based on attribute selection (e.g., ID3, C4.5).
4. Bayes Theorem: P(A|B) = P(B|A)*P(A)/P(B).

5. Constraints in ARM: Knowledge, Data, Rule constraints.

6. Support Vector Machine: Classifier that maximizes margin.

7. Decision Tree Parameters: Entropy, Information Gain, Gini Index.

8. Gaussian Mixture Steps: Initialization, E-Step, M-Step, Repeat.

9. K-NN Phases: Feature selection, Distance calculation, Voting.

10. K Value in K-NN: Balances bias-variance trade-off.

Unit 4 - Clustering

1. Partitioning Clustering: Divides dataset into exclusive clusters (e.g., K-Means).

2. K-Means vs K-Medoid
K-Means: Uses mean, sensitive to outliers.
K-Medoid: Uses medoid, robust.

3. Density-Based Clustering: Groups dense regions.

4. DBSCAN: Clusters arbitrary shapes, handles noise.

5. EM Steps: E-Step, M-Step, Repeat.

6. Hierarchical Clustering: Builds tree (e.g., agglomerative clustering).

7. Agglomerative vs Divisive
Agglomerative: Bottom-up.
Divisive: Top-down.
8. Fuzzy C-Means: Allows soft clustering.

9. Matching Methods
K-Means: Partitioning.
DBSCAN: Density-Based.
Hierarchical: Hierarchical.

10. Features of BIRCH, ROCK, Chameleon


BIRCH: Incremental clustering.
ROCK: Link-based.
Chameleon: Interconnectivity-based.

Unit 5 - Neural Networks

1. ANN: Computational model inspired by brain.

2. Backpropagation: Updates weights by propagating error.

3. Input Layer: Receives raw data.

4. Hyperparameters: Settings like learning rate, batch size.

5. Optimizers: SGD, Adam.

6. Learning Rate: Controls step size in gradient descent.

7. AND Gate with Perceptron: Weights = 1, Bias = -1.5.

8. Loss Functions: MSE, Cross-Entropy, Hinge.

9. Training vs Validation
Training: Model learns.
Validation: Model is evaluated.

10. Forward Propagation in MLP: Pass input through layers, apply weights, activations.

You might also like