CS 8031 DATA MINING & DATA WAREHOUSEING
Module - I
Data Mining : Introduction to Data Mining
What is data mining?
Related technologies - Machine Learning, DBMS, OLAP, Statistics
Data Mining Goals
Stages of the Data Mining Process
Data Mining Techniques
Knowledge Representation Methods
Applications
Example: weather data
Data Warehouse and OLAP
Data Warehouse and DBMS
Multidimensional data model
OLAP operations
Example: loan data set
Data preprocessing
Data cleaning
Data transformation
Data reduction
Discretization and generating concept hierarchies
Installing Weka 3 Data Mining System
Experiments with Weka - filters, discretization
Data mining knowledge representation
Task relevant data
Background knowledge
Interestingness measures
Representing input data and output knowledge
Visualization techniques
Experiments with Weka - visualization
Attribute-oriented analysis
Attribute generalization
Attribute relevance
Class comparison
Statistical measures
Experiments with Weka - using filters and statistics
Data mining algorithms: Association rules
Motivation and terminology
Example: mining weather data
Basic idea: item sets
Generating item sets and rules efficiently
Correlation analysis
Experiments with Weka - mining association rules
Data mining algorithms: Classification
Basic learning/mining tasks
Inferring rudimentary rules: 1R algorithm
Decision trees
Covering rules
Experiments with Weka - decision trees, rules
Data mining algorithms: Prediction
The prediction task
Statistical (Bayesian) classification
Bayesian networks
Instance-based methods (nearest neighbor)
Linear models
Experiments with Weka - Prediction
Evaluating what's been learned
Basic issues
Training and testing
Estimating classifier accuracy (holdout, cross-validation, leave-one-out)
Combining multiple models (bagging, boosting, stacking)
Minimum Description Length Principle (MLD)
Experiments with Weka - training and testing
Mining real data
Preprocessing data from a real medical domain (310 patients with Hepatitis C).
Applying various data mining techniques to create a comprehensive and accurate model of
the data.
Clustering
Basic issues in clustering
First conceptual clustering system: Cluster/2
Partitioning methods: k-means, expectation maximization (EM)
Hierarchical methods: distance-based agglomerative and divisible clustering
Conceptual clustering: Cobweb
Experiments with Weka - k-means, EM, Cobweb
Advanced techniques, Data Mining software and applications
Text mining: extracting attributes (keywords), structural approaches (parsing, soft
parsing).
Bayesian approach to classifying text
Web mining: classifying web pages, extracting knowledge from the web
Data Mining software and applications
Introduction, Relational Databases, Data Warehouses, Transactional databases, Advanced database Systems
and Application, Data Mining Functionalities, Classification of Data Mining Systems, Major Issues in Data
Mining.
Module - II
Data Warehouse : Introduction, A Multidimensional data Model, Data Warehouse Architecture, Data
Warehouse Implementation, Data Cube Technology, From Data warehousing to Data Mining.
Module - III
Data Processing : Data Cleaning, Data Integration and Transformation, Data Reduction, Discretization and
concept Hierarchy Generation.
Data Mining Primitives, Languages and System Architecture : Data Mining Primities, DMQL,
Architectures of Data Mining Systems.
Module IV
Concept Description : Data Generalization & Summarization Based Characterization, Analytical
Characterization, Mining class Comparisons, Mining Descriptive Statistical Measures in Large Databases.
Module - V
Mining Association Rules in Large Databases : Association Rule Mining, Single Dimensional Boolean
Association Rules, Multilevel Association Rules from Transaction Databases, Multi Dimensional
Association Rules from Relational Databases, From Association Mining to Correlation Analysis, Constraint
Based Association Mining.
Module - VI
Classification and Prediction : Classification & Prediction, Issues Regarding Classification & Prediction,
Classification by decision Tree Induction, Bayesian Classification, Classification by Back propagation,
Classification based on concepts & Association Rule, Other Classification, Prediction, Classification
Accuracy.
Module - VII
Cluster Analysis : Types of Data in Cluster Analysis, Partitioning methods, Hierarchical methods, Density
Based Methods, Grid Based Methods, Model Based Clustering Methods, Outlier Analysis.
Mining Complex Types of Data.
Text Books :
1. Jiawei Han & Micheline Kamber - Data Mining Concepts & Techniques
Publisher Harcout India. Private Linited.
Reference Books :
1. G.K. Gupta Introduction to Data Mining with case Studies, PHI, New Delhi 2006.
2. A. Berson & S.J. Smith Data Warehousing Data Mining, COLAP, TMH, New Delhi 2004
3. H.M. Dunham & S. Sridhar Data Mining, Pearson Education, New Delhi, 2006.