0% found this document useful (0 votes)

349 views51 pages

Lecture 1 Data Mining

This document provides an introduction to a course on data warehousing and data mining. It outlines the administrative details of the course, including the lecturers, times, materials, and assessment. It then discusses why data mining is important due to the abundance of data and need for knowledge discovery. It describes data mining as the process of discovering interesting patterns or knowledge from large amounts of data through integration of techniques from machine learning, statistics, pattern recognition and databases.

Uploaded by

Ahmed Mahmoud Saad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

349 views51 pages

Lecture 1 Data Mining

Uploaded by

Ahmed Mahmoud Saad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Data Warehousing

and Data Mining

Lecture 1 Introduction

CITS3401
CITS5504

Wei Liu

School of Computer
Science and Software
Engineering

Faculty of Engineering,
Computing and
Mathematics

Acknowledgement: The Lecture Slides are adapted from the original slides from Hans textbook.

Administrative

Unit Coordinator & Lecturer

Dr. Wei Liu
Email: wei.liu@uwa.edu.au
Office: CSSE Room 2.18
Phone: 64883095

The Unit Materials are for both CITS3401 and CITS5504

CITS3401 Bachelor of Science (Data Science Major)
CITS5504 Master of Information Technology

Common Lecture Hours:

TUESDAYS 10:00 11:45am
2

CITS3401 and CITS5504

Common Consultation Hour:

Tuesdays 2:00-3:00pm (Walk in - No appointment)
Find me either in CSSE Room 2.18 or Lab 2.01

Common Teaching Material

Lecture slides, lab sheets and projects

Different websites
http://teaching.csse.uwa.edu.au/units/CITS3401
http://teaching.csse.uwa.edu.au/units/CITS5504

Different Lab Sessions (from Week 2 onward):

CITS3401: Tuesdays 2:00-4:00pm Dr. Syed Mohammed Shamsul Islam
(Shams)
CITS5504: Mondays 9:00-11:00am Dr. Wei Liu

Common Assessment Structures

Two projects : 20% each

An analysis of a business scenario through an OLAP tool.
We will be using an excel plug-in JEDOX for Data Warehousing Project.
http://www.jedox.com/en/services/downloads
An analysis of a data mining and exploration problem using WEKA.
Weka is a collection of machine learning algorithms for data mining tasks.
The algorithms can either be applied directly to a dataset or called from your
own Java Code
http://www.cs.waikato.ac.nz/ml/weka/

Mid-semester Test: 10%

at the lecture venue after the study break

Final Examination: 50%

Project Specifications and Instructions will be available on the

course website.
4

Text Book and Recommend Readings

Course Text Book:

Data Mining: Concepts and Techniques
2nd ed., Jiawei Han and Micheline Kamber- 2006
3rd ed., Jiawei Han and Micheline Kamber, Jian Pei -2011
Jiawei Hans web page:
http://web.engr.illinois.edu/~hanj/

References:
Data Mining: Methods and Techniques by, A. Shawkat Ali and
Saleh Wasimi Thomson, 2007
Data Mining: The Textbook by, Charu C. Aggarwal, Springer,
May 2015

Introduction to Data Mining

Why Data Mining?

What Is Data Mining? A Knowledge Discovery (KDD) Process

A Multi-Dimensional View of Data Mining/ classification

What Kinds of Data Can Be Mined?
What Kinds of Patterns Can Be Mined?

What Kinds of Technologies Are Used?

What Kinds of Applications Are Targeted?

Are all the patterns interesting?

Integration of Data Mining System with Data Warehousing System

Major Issues in Data Mining

Why Data Mining?

The Explosive Growth of Data: from terabytes to petabytes

Data Explosion
Our capability of generating , collecting, storing and managing data has
grown tremendously in the last 50 years.

Data collection and data availability

Automated data collection tools, database systems, Web, computerized
society

Major sources of abundant data

Business: Web, e-commerce, transactions, stocks,
Science: Remote sensing, bioinformatics, scientific simulation,
Society and everyone: news, digital cameras, YouTube

We are drowning in data, but starving for knowledge!

Necessity is the mother of inventionData mining
Automated and scalable analysis of massive data sets
7

Potential Applications

Data analysis and decision support

Market analysis and management
Target marketing, customer relationship management (CRM),
market basket analysis, cross selling, market segmentation
Risk analysis and management
Forecasting, customer retention, improved underwriting,
quality control, competitive analysis
Fraud detection and detection of unusual patterns (outliers)

Other Applications
Text mining (news group, email, documents) and Web mining
Stream data mining
8

Example 1: Market Analysis

Where does the data come from?

Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus
(public) lifestyle studies,

Target marketing
Find clusters of model customers who share the same characteristics:
interest, income level, spending habits, etc.
Determine customer purchasing patterns over time
Cross-market analysisFind associations/co-relations between product
sales, & predict based on such association
Customer profilingWhat types of customers buy what products
(clustering or classification)
Customer requirement analysis
Identify the best products for different groups of customers
Predict what factors will attract new customers
Provision of summary Information:
Multidimensional summary reports
Statistical summary information (data central tendency and variation)

Example 2: Corporate Analysis and

Risk Management

Finance planning and asset evaluation

cash flow analysis and prediction

contingent claim analysis to evaluate assets

cross-sectional and time series analysis (financialratio,trend analysis, etc.)

Resource planning
summarize and compare the resources and spending

Competition
monitor competitors and market directions

group customers into classes and a class-based pricing

procedure
set pricing strategy in a highly competitive market
10

Example 3. Fraud Detection and

Mining Unusual Patterns
Approaches: Clustering & model construction for frauds,
outlier analysis
Applications: Health care, retail, credit card service, telecomm.
Money laundering: suspicious monetary transactions
Medical insurance:
Professional patients, ring of doctors, and ring of references
Unnecessary or correlated screening tests
Telecommunications: phone-call fraud
Phone call model: destination of the call, duration, time of day
or week. Analyze patterns that deviate from an expected norm
Retail industry:
Analysts estimate that 38% of retail shrink is due to dishonest
employees

Anti-terrorism:
11

Evolution of Sciences

Before 1600, empirical science

1600-1950s, theoretical science

Each discipline has grown a theoretical component. Theoretical models often motivate
experiments and generalize our understanding.

1950s-1990s, computational science

Over the last 50 years, most disciplines have grown a third, computational branch (e.g.
empirical, theoretical, and computational ecology, or physics, or linguistics.)
Computational Science traditionally meant simulation. It grew out of our inability to find
closed-form solutions for complex mathematical models.

1990-now, data science (data-driven science)

The flood of data from new scientific instruments and simulations
The ability to economically store and manage petabytes of data online
The Internet and computing Grid that makes all these archives universally accessible
Scientific info. management, acquisition, organization, query, and visualization tasks
scale almost linearly with data volumes. Data mining is a major new challenge!

Evolution of Database Technology

1960s:
Data collection, database creation, IMS and network DBMS

1970s:
Relational data model, relational DBMS implementation

1980s:
RDBMS, advanced data models (extended-relational, OO, deductive, etc.)
Application-oriented DBMS (spatial, scientific, engineering, etc.)

1990s:
Data mining, data warehousing, multimedia databases, and Web databases

2000s
Stream data management and mining
Data mining and its applications
Web technology (XML, data integration) and global information systems

Why Data Mining

Summary:
Abundance of data and data archives are seldom visited.
Far exceeded human ability for comprehension
Intuitive decisions are prone to biases and errors, and is
extremely time-consuming and costly
Data mining tools perform data analysis and uncover important
data patterns, contributing greatly to business strategies,
knowledge bases, and scientific and medical research.

Data
Tombs

Nuggets of
knowledge
14

What is Data Mining?

Data mining (knowledge discovery from data)

Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge from
huge amount of data
Data mining: a misnomer? (Knowledge Mining from data)
Alternative names
Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging,
information harvesting, business intelligence, etc.
Watch out: Is everything data mining?
Simple search and query processing
(Deductive) expert systems

What is Data Mining?

Tremendous amount of data (terabyte-petabyte)

High-dimensionality and high complexity of data
Structured, un-structured, heterogeneous data

Scalable
Data mining involves integration of multiple disciplines:

Machine learning
Pattern recognition
Statistics
Databases
Business Intelligence
Big data
Efficient: Derived knowledge is new, interesting, informative and
can be used for sophisticated application (decision making,
process control, information management....)

Data Mining: Confluence of Multiple

Disciplines
Database
Technology

Machine
Learning
Pattern
Recognition

Statistics

Data Mining

Algorithm

Visualization

Other
Disciplines
17

Steps of Knowledge Discovery

(KDD) Process

This is a view from typical

database systems and data
warehousing communities

Pattern Evaluation

Data mining plays an essential

role in the knowledge
discovery process

Data Mining

Task-relevant Data
Data Warehouse

Selection

Data Cleaning
Data Integration
Databases

Data Warehousing and Mining

Framework

KDD Process: Several Key Steps

Learning the application domain
relevant prior knowledge and goals of application
Creating a target data set: data selection
Data cleaning and preprocessing: (may take 60% of effort!)
Data reduction and transformation
Find useful features, dimensionality/variable reduction, invariant
representation
Choosing functions of data mining
summarization, classification, regression, association, clustering
Choosing the mining algorithm(s)
Data mining: search for patterns of interest
Pattern evaluation and knowledge presentation
visualization, transformation, removing redundant patterns, etc.
Use of discovered knowledge
20

Multi-Dimensional View of Data

Mining

Data to be mined
Database data (extended-relational, object-oriented,
heterogeneous, legacy), data warehouse, transactional data,
stream, spatiotemporal, time-series, sequence, text and web, multimedia, graphs & social and information networks
Knowledge to be mined (or: Data mining functions)
Characterization, discrimination, association, classification,
clustering, trend/deviation, outlier analysis, etc.
Descriptive vs. predictive data mining
Multiple/integrated functions and mining at multiple levels
Techniques utilized (methodologies)
Data-intensive, data warehouse (OLAP), machine learning,
statistics, pattern recognition, visualization, high-performance, etc.
Applications adapted
Retail, telecommunication, banking, fraud analysis, bio-data mining,
stock market analysis, text mining, Web mining, etc.
21

Data Mining: On What Kinds of

Data?

Structured and semi-structured data

Relational database/ Object-relational data
Data Warehouse,
Transactional Database

Unstructured data
Data streams and sensor data
Text data and web data
Time-series data, temporal data, sequence data (incl. biosequences)
Graphs, social networks and information networks
Spatial data, spatiotemporal data and multimedia data
22

Relational Database

A relational database is a collection of tables, each of which is

assigned a unique name.
Each table consists of a set of attributes (columns or fields)
and usually stores a large set of tuples (records or rows).
Each tuple in a relational table represents an object identified
by unique key and described by a set of attribute values.
A semantic data model, such as the entity relationship data
model, is often constructed for relational databases.
An ER data model represents the database as a set of entities
and their relationships.
23

Relational Database

Relational data can be accessed by database queries

written in a relational language such as SQL.
A given query is transformed into a set of relational
operations such as join, selection and projection,
and is then optimized for efficient processing.
Efficiency of retrieval, efficiency of update and
integrity are the key requirements of a good
relational database.

An Example - AllElectronics

Four relational tables: customer, item, employee and

branch.
Each relation consists of a set of attributes.

Example of Queries

Show me a list of all items that were sold in the last

quarter

Show me the total sales of the last month, grouped

by branch
Which sales person has the highest amount of
sales?
How many sales transactions occurred in the month
of September?
26

Purpose of relational databases

The main purpose of a relational database is to store
data correctly and retrieve data on demand.
This type of data processing is sometime called
Online Transaction Processing (OLTP).
Relational databases are passive data repositories in
the sense that a query only shows you what is
stored in the database, but cannot tell you much
about the meaning or trend of the data.

Data Warehouse of AllElectronics

A data warehouse is a repository of information collected

from multiple sources, stored under a unified schema,
and that usually resides at a single site.
Need is to provide an analysis of the companys sales per
item type per branch for the a specified period.

Data Warehouse

The data warehouse

may store a summary
of the transactions per
item type for each
store or, summarized
to a higher level, for
each sales region.

Transactional Database

A transactional database consists of a file where each

record represents a transaction.

Supports nested relation

Transaction id: Items, Customer name, date
Sample Queries:
Show me all the items purchased by X
How many transactions include item number Y?
market basket data analysis: Which items sold well
together? (Frequent item set)
30

Knowledge View: What Knowledge to be

mined?
Data summary in multidimensional space
Data cube and OLAP (On-Line Analytical Processing)
Pattern discovery
Mining frequent patterns, association and correlation
Applying pattern mining in many other tasks
Classification and predictive modelling
Model construction based on some training examples
Prediction of new data based on constructed models
Cluster analysis: How to group data to form new categories?
Outlier analysis: Discovery of anomalies and rare events
Trend and evolution analysis
31

Data Mining Function: (1)

Characterization and Discrimination
Data can be associated with classes or concepts. ( e.g.,
classes of items: computer, printers concept of
customers: bigSpender, budgetSpender are the
descriptions )
Multidimensional concept description:
Characterization: summarizing the class in general. (e.g. general
specification of products whose sales increased by 10% and,
.profile of customers who spend more than $1000 a year. )
Discrimination: comparison of target class with a contrast class.(
compare the two groups of customers, such as who shop computer
products regularly versus who rarely shop such products). Drilling
down on dimensions such as occupation, age, etc.)
32

Data Mining Function: (2)

Association and Correlation Analysis
Frequent patterns (or frequent item_sets)
What items are frequently purchased together ?

Association, correlation vs. causality

A typical association rule
Milk Bread [0.5%, 75%] (support, confidence)
Are strongly associated items also strongly correlated?

How to mine such patterns and/or set rules efficiently in

large datasets? ( single or multi-dimensional
association, minimum support threshold)
How to use such patterns for classification, clustering,
and other applications?
33

Data Mining Function: (3)

Classification
Classification and label prediction
Construct models (functions) based on some training examples or
rules.[example: kind of response (good, mild, no) in sales
campaign: price, brand, category, place_made]
Describe and distinguish classes or concepts for future prediction
E.g., classify countries based on (climate), or classify cars
based on (gas mileage)
Predict some unknown class labels

Typical methods
Decision trees, nave Bayesian classification, support vector
machines, neural networks, rule-based classification, pattern-based
classification, logistic regression,

Typical applications:
Credit card fraud detection, direct marketing, classifying stars,
diseases, web-pages,
34

Data Mining Function: (4) Cluster

Analysis
Unsupervised learning (i.e., Class label is unknown)
Group data to form new categories (i.e., clusters),
e.g., cluster houses to find distribution patterns
Principle: Maximizing intra-class similarity &
minimizing interclass similarity

Example: homogeneous sub-population of

AllElectronics customers (customer attributes: city,
age, income,..)
Many methods and applications

Data Mining Function: (5) Outlier

Analysis
Outlier analysis
Outlier: A data object that does not comply with the general
behavior of the data
Most data mining methods discard outliers as noise or
exceptions.
Noise or exception? One persons garbage could be
another persons treasure
Methods: by product of clustering or regression analysis,
distance analysis, statistical or probability model,
Useful in fraud detection, rare events are more interesting
Example: By detecting a purchase of extremely large
amount for a given account number.
36

Time and Ordering: Sequential

Pattern, Trend and Evolution Analysis
Sequence, trend and evolution analysis
Trend, time-series, and deviation analysis: e.g., regression
and value prediction
Sequential pattern mining
e.g., first buy digital camera, then buy large SD
memory cards
Periodicity analysis (e.g., overall stock market evolution
regularities or for particular companies)
Motifs and biological sequence analysis
Approximate and consecutive motifs
Similarity-based analysis
Mining data streams
Ordered, time-varying, potentially infinite, data streams
37

Structure and Network Analysis

Graph mining
Finding frequent subgraphs (e.g., chemical compounds), trees
(XML), substructures (web fragments)
Information network analysis
Social networks: actors (objects, nodes) and relationships (edges)
e.g., author networks in CS, terrorist networks
Multiple heterogeneous networks
A person could be multiple information networks: friends, family,
classmates,
Links carry a lot of semantic information: Link mining
Web mining
Web is a big information network: from PageRank to Google
Analysis of Web information networks
Web community discovery, opinion mining, usage mining,
38

Methodology View: Confluence of

Multiple Disciplines
Machine
Learning

Applications

Algorithm

Pattern
Recognition

Data Mining

Database
Technology

Statistics

Visualization

Distributed /
cloud
computing
39

Why Confluence of Multiple

Disciplines?
Tremendous amount of data
Algorithms must be scalable to handle big data
High-dimensionality of data
Micro-array may have tens of thousands of dimensions
High complexity of data
Data streams and sensor data
Time-series data, temporal data, sequence data
Structure data, graphs, social and information networks
Spatial, spatiotemporal, multimedia, text and Web data
Software programs, scientific simulations
New and sophisticated applications
40

Application View: Diverse Applications

Mining text data and mining the Web

Web page classification and ranking, Weblog analysis,
recommender systems,
Mining business data
Transaction data, market basket analysis, fraud detection,

Data mining and software/system engineering e.g.,

mining software bugs , optimize system performance,
help in computer vision
Mining biological and medical data
Gene, protein, microarray data, biological networks
Mining social and information networks
Community discovery, information propagation,
Invisible data mining : web search, stock market analysis
41

Classification of Data Mining System

According to the kinds of database mined:

relational, transactional, .spatial, text, stream data.or World Wide Web

According to the kinds of knowledge mined:

Based on mining functionalities, e.g. : characterization, discrimination,

association, .can be multiple and/or integrated data mining., can be
distinguished based on granularity, regular or irregular patterns(outliers)
mining

According to the techniques utilized:

degree of user interaction involved ( autonomous, interactive, query-driven),
method of analysis (machine learning, pattern recognition, statistics, neural
network.), combining merits of individual aspects..

According to the applications adapted:

Finance, Telecommunication, DNA, stock-marketall purpose data mining
system may not fit for domain specific minig.

Summary (till this)

Data mining: Discovering interesting patterns and knowledge
from massive amount of data
A natural evolution of science and information technology, in
great demand, with wide applications
A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
Mining can be performed in a variety of data
Data mining functionalities: characterization, discrimination,
association, classification, clustering, trend and outlier
analysis, etc.
Data mining technologies and applications
43

Evaluation of Knowledge

Are all mined knowledge interesting?

One can mine tremendous amount of patterns
Some may fit only certain dimension space
time, location,
Some may not be representative, may be transient,
Evaluation of mined knowledge directly mine only
interesting knowledge?
Descriptive vs. predictive
Coverage
Typicality vs. novelty
Accuracy
Timeliness

44

Are All the Discovered Patterns

Interesting?

Data mining may generate thousands of patterns: Not all of them

are interesting
Suggested approach: Human-centered, query-based, focused mining

Interestingness measures
A pattern is interesting if it is easily understood by humans, valid on new or
test data with some degree of certainty, potentially useful, novel, or validates
some hypothesis that a user seeks to confirm

Objective vs. subjective interestingness measures

Objective: based on statistics and structures of patterns, e.g., support,
confidence, etc.

Subjective: based on users belief in the data, e.g., unexpectedness,

novelty, actionability, etc.

Find All and Only Interesting

Patterns?

Find all the interesting patterns: Completeness

Can a data mining system find all the interesting patterns? Do we
need to find all of the interesting patterns?

Heuristic vs. exhaustive search

Association vs. classification vs. clustering

Search for only interesting patterns: An optimization problem

Can a data mining system find only the interesting patterns?

Approaches
First general all the patterns and then filter out the uninteresting
ones
Generate only the interesting patternsmining query
optimization
46

Integration of Data Mining and Data

Warehousing

Data mining systems, DBMS, Data warehouse systems coupling

No coupling, loose-coupling, semi-tight-coupling, tight-coupling

On-line analytical mining data

integration of mining and OLAP technologies

Interactive mining multi-level knowledge

Necessity of mining knowledge and patterns at different levels of
abstraction by drilling/rolling, pivoting, slicing/dicing, etc.

Integration of multiple mining functions

Characterized classification, first clustering and then association
47

Coupling Data Mining with DB/DW

Systems
No couplingflat file processing for developing efficient and effective
algorithms, is a poor design as may spend time in preprocessing.

Loose coupling- Fetching data from DB/DW. Mining does not explore
data structure and optimization methods provided by DB & DW.Difficult for
high scalability.

Semi-tight couplingenhanced DM performance

Provide efficient implement a few data mining primitives in a DB/DW
system, e.g., sorting, indexing, aggregation, histogram analysis, multiway
join, precomputation of some statistical functions

Tight couplinguniform processing environment

DM is smoothly integrated into a DB/DW system, mining query is optimized
based on mining query, indexing, query processing methods, etc.

Major Issues in Data Mining (1)

Mining Methodology
Mining various and new kinds of knowledge
Mining knowledge in multi-dimensional space at multiple level of
abstraction.
Data mining: An interdisciplinary effort
Boosting the power of discovery in a networked environment

Handling noise, uncertainty, and incompleteness of data

Pattern evaluation and pattern- or constraint-guided mining

User Interaction
Interactive mining
Background knowledge (integrity constraints & deduction rules)
Presentation and visualization of data mining results
49

Major Issues in Data Mining (2)

Efficiency and Scalability

Efficiency and scalability of data mining algorithms
Parallel, distributed, stream, and incremental mining methods
Diversity of data types
Handling complex types of data

Mining dynamic, networked, and global data repositories

Data mining and society
Social impacts of data mining

Privacy-preserving data mining

Invisible data mining
50

A Brief History of Data Mining Society

1989 IJCAI Workshop on Knowledge Discovery in Databases

Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. Frawley,
1991)
1991-1994 Workshops on Knowledge Discovery in Databases
Advances in Knowledge Discovery and Data Mining (U. Fayyad, G.
Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 1996)
1995-1998 International Conferences on Knowledge Discovery in
Databases and Data Mining (KDD95-98)
Journal of Data Mining and Knowledge Discovery (1997)
ACM SIGKDD conferences since 1998 and SIGKDD Explorations
More conferences on data mining
PAKDD (1997), PKDD (1997), SIAM-Data Mining (2001), (IEEE) ICDM
(2001), WSDM (2008), etc.
ACM Transactions on KDD (2007)

Data Mining in Medicine
No ratings yet
Data Mining in Medicine
42 pages
Association Rule Mining Lesson PDF
No ratings yet
Association Rule Mining Lesson PDF
9 pages
Python ML Course Notes
No ratings yet
Python ML Course Notes
36 pages
Statistics Machine Learning Python Draft
No ratings yet
Statistics Machine Learning Python Draft
173 pages
Data Science for Business Leaders
No ratings yet
Data Science for Business Leaders
9 pages
Data Mining Tutorial
100% (2)
Data Mining Tutorial
64 pages
Chapter 5 - Data Exploration and Visualization With
No ratings yet
Chapter 5 - Data Exploration and Visualization With
39 pages
Shiny
No ratings yet
Shiny
21 pages
PythonForDataScience PDF
No ratings yet
PythonForDataScience PDF
1 page
SAS Presentation
No ratings yet
SAS Presentation
49 pages
Python Interview Prep Guide
100% (1)
Python Interview Prep Guide
144 pages
Statistics Probability
No ratings yet
Statistics Probability
66 pages
Data Wrangling
No ratings yet
Data Wrangling
30 pages
R for NGS Data Analysis Beginners
No ratings yet
R for NGS Data Analysis Beginners
5 pages
Data Mining
100% (5)
Data Mining
89 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Data Science With Python - Lesson 10 - Data Visualization in Python With Matplotlib - Raw
No ratings yet
Data Science With Python - Lesson 10 - Data Visualization in Python With Matplotlib - Raw
71 pages
Variable Selection
No ratings yet
Variable Selection
15 pages
Python Data Analysis Libraries Guide
100% (1)
Python Data Analysis Libraries Guide
43 pages
Mastering .NET Machine Learning - Sample Chapter
No ratings yet
Mastering .NET Machine Learning - Sample Chapter
27 pages
Social Media Mining with R
No ratings yet
Social Media Mining with R
27 pages
Distributed Database System
No ratings yet
Distributed Database System
6 pages
Introduction To Data Science
75% (4)
Introduction To Data Science
74 pages
William Wizner - Python For Data Science - Data Analysis and Deep Learning With Python Coding and Programming
100% (1)
William Wizner - Python For Data Science - Data Analysis and Deep Learning With Python Coding and Programming
73 pages
Data Analytics for Aspiring Analysts
No ratings yet
Data Analytics for Aspiring Analysts
54 pages
R Command Cheatsheet2551545
No ratings yet
R Command Cheatsheet2551545
2 pages
RStudio Cookbook: Data Analysis Recipes
100% (2)
RStudio Cookbook: Data Analysis Recipes
38 pages
Data Science A Beginner S Guide 1668243666
100% (1)
Data Science A Beginner S Guide 1668243666
26 pages
Data Mining Basics
No ratings yet
Data Mining Basics
20 pages
Kaspersky Lab Whitepaper Machine Learning
No ratings yet
Kaspersky Lab Whitepaper Machine Learning
17 pages
DS Mod 1 To 2 Complete Notes
No ratings yet
DS Mod 1 To 2 Complete Notes
63 pages
Supervised Learning Basics
No ratings yet
Supervised Learning Basics
19 pages
Applying Data Mining Techniques Using SAS Enterprise Miner
No ratings yet
Applying Data Mining Techniques Using SAS Enterprise Miner
308 pages
Distributed Query Processing Guide
No ratings yet
Distributed Query Processing Guide
24 pages
Programming For Data Science
100% (1)
Programming For Data Science
4 pages
Data Mining Concepts and Techniques
67% (3)
Data Mining Concepts and Techniques
136 pages
An Introduction To Data Mining
No ratings yet
An Introduction To Data Mining
47 pages
Data Whare House PDF
No ratings yet
Data Whare House PDF
51 pages
01 Intro 1
No ratings yet
01 Intro 1
50 pages
Week1 1
No ratings yet
Week1 1
18 pages
1 Intro
No ratings yet
1 Intro
50 pages
Chapter 1 - Tagged
No ratings yet
Chapter 1 - Tagged
46 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
19 pages
DM-Unit 1
No ratings yet
DM-Unit 1
110 pages
Data Mining
No ratings yet
Data Mining
26 pages
Inf 444e - Datamining N Advanced Databases Introduction 2019
No ratings yet
Inf 444e - Datamining N Advanced Databases Introduction 2019
32 pages
01 Intro
No ratings yet
01 Intro
52 pages
Data Mining SSWT ZC 425
No ratings yet
Data Mining SSWT ZC 425
381 pages
Introduction To Data Mining
75% (4)
Introduction To Data Mining
45 pages
Module - 1 - DM
No ratings yet
Module - 1 - DM
52 pages
01 Intro
No ratings yet
01 Intro
28 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
Lec Slides Combined Mid Quiz With Old Quizzes
No ratings yet
Lec Slides Combined Mid Quiz With Old Quizzes
378 pages
Data Mining: Concepts and Techniques: - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1
37 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
01intro (Autosaved)
No ratings yet
01intro (Autosaved)
43 pages
1 - 1 Intro To Data Mining - ch1
No ratings yet
1 - 1 Intro To Data Mining - ch1
18 pages
Utkarsh Shandilya CV
No ratings yet
Utkarsh Shandilya CV
1 page
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Practical 2:: Analyzing Data With Pivot Tables
0% (1)
Practical 2:: Analyzing Data With Pivot Tables
10 pages
Sylabus
No ratings yet
Sylabus
2 pages
Master Data Management Using SAP MDG On HANA - A Cookbook - Sukant Pandey - SAP Data Management 1, 2015 - Anna's Archive
No ratings yet
Master Data Management Using SAP MDG On HANA - A Cookbook - Sukant Pandey - SAP Data Management 1, 2015 - Anna's Archive
67 pages
How Is RAG Used in The Industry Launchpad - Rag - Seminar - q2 - 8 - May - 2025
No ratings yet
How Is RAG Used in The Industry Launchpad - Rag - Seminar - q2 - 8 - May - 2025
49 pages
PL-900 Microsoft Power Platform Fundamentals
No ratings yet
PL-900 Microsoft Power Platform Fundamentals
3 pages
Oracle SQL 9i
No ratings yet
Oracle SQL 9i
76 pages
BODS Job Stat Collection Report
No ratings yet
BODS Job Stat Collection Report
18 pages
Multi-Meta-RAG: Enhanced RAG for Multi-Hop Queries
No ratings yet
Multi-Meta-RAG: Enhanced RAG for Multi-Hop Queries
10 pages
Data Analyst Resume
No ratings yet
Data Analyst Resume
1 page
SQL MCQ
100% (1)
SQL MCQ
7 pages
1st Year (2018-2019) 1st Term 2nd Term: Semester
No ratings yet
1st Year (2018-2019) 1st Term 2nd Term: Semester
2 pages
09 - AI-900 1-35 - M - Answered
No ratings yet
09 - AI-900 1-35 - M - Answered
9 pages
Database Management System
No ratings yet
Database Management System
19 pages
Scoop PPT
No ratings yet
Scoop PPT
3 pages
Review Questions and Discussion Questions
No ratings yet
Review Questions and Discussion Questions
12 pages
Ai Powered Search Engine
No ratings yet
Ai Powered Search Engine
31 pages
Text Data Cleaning with Python
No ratings yet
Text Data Cleaning with Python
5 pages
Information Technology (IT), As Defined by The Information Technology Association
100% (1)
Information Technology (IT), As Defined by The Information Technology Association
3 pages
Lab 2 Database
No ratings yet
Lab 2 Database
4 pages
Advanced Excel for Professionals
No ratings yet
Advanced Excel for Professionals
5 pages
College Information Literacy Exam
No ratings yet
College Information Literacy Exam
8 pages
SAP Business Explorer Tools
No ratings yet
SAP Business Explorer Tools
12 pages
AI Chat Bot - Module - II
No ratings yet
AI Chat Bot - Module - II
28 pages
Db2 Interview Question
No ratings yet
Db2 Interview Question
124 pages
Top Down Database Design
No ratings yet
Top Down Database Design
4 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
16 pages
Bade Bhai Sahab Mind Map - Google Search
No ratings yet
Bade Bhai Sahab Mind Map - Google Search
1 page
Standard Operating Procedures: 1. Project Inception and Requirements Gathering
No ratings yet
Standard Operating Procedures: 1. Project Inception and Requirements Gathering
10 pages