KEMBAR78
DWDM Lecture PPT Unit3 Part2 | PDF | Data Warehouse | Data
0% found this document useful (0 votes)
11 views23 pages

DWDM Lecture PPT Unit3 Part2

The document provides an overview of data warehousing and data mining techniques used by various industries, including e-commerce, banking, healthcare, and telecommunications. It highlights specific data warehousing solutions like Amazon Redshift and SBI's NextGen Data Warehouse, as well as data mining techniques such as clustering, prediction, and anomaly detection. Additionally, it discusses the infrastructure and locations of data warehouses in India for companies like Amazon, SBI, Max Hospitals, Instagram, and Jio.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views23 pages

DWDM Lecture PPT Unit3 Part2

The document provides an overview of data warehousing and data mining techniques used by various industries, including e-commerce, banking, healthcare, and telecommunications. It highlights specific data warehousing solutions like Amazon Redshift and SBI's NextGen Data Warehouse, as well as data mining techniques such as clustering, prediction, and anomaly detection. Additionally, it discusses the infrastructure and locations of data warehouses in India for companies like Amazon, SBI, Max Hospitals, Instagram, and Jio.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Data

Warehousing
and Data
Mining

By Arushi Singh
Unit 3
Data Mining: Overview, Motivation,
De inition & Functionalities, Data Processing,
Form of Data Pre-processing, Data Cleaning:
Missing Values, Noisy Data, (Binning,
Clustering, Regression, Computer and
Human inspection), Inconsistent Data, Data
Integration and Transformation. Data
Reduction:-Data Cube Aggregation,
Dimensionality reduction, Data Compression,
Numerosity Reduction, Discretization and
Concept hierarchy generation, Decision Tree
f
Data Warehousing in E-Commerce
Amazon Data Warehouse

• Amazon uses Amazon Redshift as its primary data warehouse for analytics.
• Redshift is a cloud-based data warehouse that uses SQL to analyze data.
• How Amazon Redshift works :
• Analyzes structured and semi-structured data
• Uses columnar storage to speed up queries
• Uses massively parallel processing to complete queries in parallel
• Integrates with data lakes, operational databases, and business intelligence tools
Data warehousing services
• Amazon Redshift —> A cloud data warehouse service that's optimized for
online analytic processing.

• Amazon DynamoDB —> A NoSQL database service that can be used as an


OLTP store.

• Amazon Aurora —> A relational database built for the cloud that's
compatible with MySQL and PostgreSQL.

• Amazon RDS —> A service that helps set up, operate, and scale relational
databases on the cloud.
Amazon's e-commerce data warehouses
Located in India

• Mumbai
• Bangalore
• Delhi
• Hyderabad
• Kolkata
• Vijayawada
Data mining techniques used by Amazon
• Clustering —> Groups data elements with similar properties. This helps businesses
make decisions about consumer habits.

• Market basket analysis —> Predicts what products customers will buy together
based on their past purchase patterns.

• Outlier analysis —> Identi es data points that are di erent from the rest of the
data. This helps businesses identify and x issues.

• Prediction —> Forecasts future values of data to help businesses make decisions. For
example, Amazon uses prediction to protect customers from online fraud.

• Text mining —> Extracts information from documents, emails, and social media
posts. This helps businesses with content categorization, sentiment analysis, and
information retrieval.
fi
fi
ff
Data Warehousing in Banking
SBI Data Warehouse

• State Bank of India (SBI) uses a NextGen Data Warehouse.


• NextGen Data Warehouse
• A modern data warehouse architecture that integrates both traditional
structured data and large volumes of "big data" from various sources into a
single platform

• Utilizes cloud-based technologies to handle massive data sets with high


performance and scalability
SBI data warehouses
Located in :

• Global IT Centre in Belapur, Navi Mumbai


Data mining techniques for fraud detection in financial transactions
Used By SBI

• Decision trees: Provide a structured way to classify data by making decisions based
on speci c features, making them easy to interpret.
fi
Data mining techniques for fraud detection in financial transactions
Used By SBI

• Logistic
regression:
Predicts the
probability of a
transaction being
fraudulent based on
various factors,
allowing for risk
assessment.
Data mining techniques for fraud detection in financial transactions
Used By SBI

• Support Vector
Machines (SVMs):
E ective at
identifying complex
patterns and
separating data
points into di erent
classes, especially
when the data is not
linearly separable.
ff
ff
Data mining techniques for fraud detection in financial transactions
Used By SBI

• Neural Networks:
Can learn complex
relationships within
large datasets,
making them
powerful for
identifying subtle
anomalies.
Data mining techniques for fraud detection in financial transactions
Used By SBI

• Anomaly detection:
Identi es
transactions that
signi cantly deviate
from typical patterns,
which could indicate
fraudulent activity.
fi
fi
Data mining techniques for fraud detection in financial transactions
Used By SBI

• Clustering
algorithms: Group
similar transactions
together to identify
potential outliers or
clusters of fraudulent
behavior
Data mining techniques for fraud detection in financial transactions
Used By SBI

• Association rule mining:


Discovers relationships
between di erent variables to
nd patterns that might be
associated with fraudulent
transactions.
fi
ff
Data Warehousing in Healthcare

• Hospitals uses Healthcare Data


Warehouse (HDW)

• A centralized repository that stores and


manages vast amounts of health-related
data collected from various sources like
Electronic Health Records (EHRs),
laboratory databases, insurance claims,
and other healthcare systems

• Healthcare data warehouse is typically


considered cloud-based or hybrid
Max Hospitals data warehouses
Located in India:

• New Delhi, India


Data mining techniques used by Healthcare for Disease prediction

Classi cation K-means Clustering Association Rule


Algorithms Mining
To predict the
To predict the likelihood likelihood of a speci c To discover relationships
of a speci c disease disease based on between di erent factors
based on patient data. patient data. that might contribute to
disease development.
fi
fi
ff
fi
Sentiment Analysis on Social Media
Instagram Data Warehouse

• Primary database: PostgreSQL


• For high-scalability: Cassandra
• In-memory data caching: Redis
Instagram Datawarhouse Located in
India:

Chennai, Tamil Nadu at the Reliance


Industries campus
Data mining techniques used by Instagram for Sentiment Analysis

Natural Language
Processing
Customer Churn prediction in Telecom Industry
Jio Data Warehouse

• Jio primarily utilizes Microsoft


Azure's data warehouse services.

• A cloud based data warehouse

Jio Datawarhouse Located in India: Data Mining Techniques used for


Customer Churn prediction
Mumbai, Chennai, Hyderabad, and
Bangalore. Classi cation Algorithms
fi
Thank you

You might also like