Some key events in evolution of Data Warehouse-
1960- Dartmouth and General Mills in a joint research project, develop the
terms dimensions and facts.
1970- A Nielsen and IRI introduces dimensional data marts for retail
sales.
1983- Tera Data Corporation introduces a database management system
which is specifically designed for decision support
# The idea of data warehousing came to the late 1980's when IBM
researchers Barry Devlin and Paul Murphy established the "Business
Data Warehouse.".
However, the real concept was given by Inmon Bill. He was considered as
a father of data warehouse. He had written about a variety of topics for
building, usage, and maintenance of the warehouse & the Corporate
Information Factory.
# The term "Data Warehouse" was first coined by Bill Inmon in 1990.
According to Inmon, a data warehouse is a subject oriented, integrated, time-
variant, and non-volatile collection of data. This data helps analysts to take
informed decisions in an organization.
In short, the data warehousing idea was planned to support an architectural model
for the flow of information from the operational system to decisional support
environments. The concept attempt to address the various problems associated
with the flow, mainly the high costs associated with it.
In the absence of data warehousing architecture, a vast amount of space was
required to support multiple decision support environments. In large
corporations, it was ordinary for various decision support environments to operate
independently.
Goals of Data Warehousing
o To help reporting as well as analysis
o Maintain the organization's historical information
o Be the foundation for decision making.
Data Warehouse is needed for the following reasons:
1. Business User: Business users require a data warehouse to view summarized
data from the past. Since these people are non-technical, the data may be
presented to them in an elementary form.
2. Store historical data: Data Warehouse is required to store the time variable
data from the past. This input is made to be used for various purposes.
3. Make strategic decisions: Some strategies may be depending upon the data
in the data warehouse. So, data warehouse contributes to making strategic
decisions.
4. For data consistency and quality: Bringing the data from different sources at
a commonplace, the user can effectively undertake to bring the uniformity and
consistency in data.
5. High response time: Data warehouse has to be ready for somewhat
unexpected loads and types of queries, which demands a significant degree of
flexibility and quick response time.
Data warehousing is a method of organizing and compiling data into one
database, whereas data mining deals with fetching important data from databases.
Data mining attempts to depict meaningful patterns through a dependency on the
data that is compiled in the data warehouse.
DATA WAREHOUSE:
Data warehousing is a unique technique that helps collect and manage data from
various sources. A data warehouse is where data can be collected for mining
purposes, usually with large storage capacity. Various organizations’ systems are
in the data warehouse, where it can be fetched as per usage.
Source Extract Transform Load Target.
(Data warehouse process)
Data warehouses collaborate data from several sources and ensure data accuracy,
quality, and consistency. In a data warehouse, data is sorted into a formatted pattern
by type and as needed. The data is examined by query tools using several patterns.
Data warehouses store historical data and handle requests faster, helping in online
analytical processing, whereas a database is used to store current transactions in a
business process that is called online transaction processing.
# A Data Warehouse provides integrated, enterprise-wide, historical data and
focuses on providing support for decision-makers for data modelling and analysis.
FEATURES OF DATA WAREHOUSES:
Time - Variant
The data collected in a data warehouse is identified with a specific period. It means
data warehouse has to contain historical data, not just current values.
1.) Allow the analysis of past.
2.) Relate information to the present
3.) Enables forecast to future
Integrated:
Different sources are put together to build a data warehouse, such as level documents
or social databases. Data in Data warehouse comes from several operational system.
Before data integration some steps are follows:--
1.) Remove inconsistency
2.) Transformation
3.) Integration of source data
Non-volatile:
This means the earlier data is not deleted when new data is added to the data
warehouse. Data granularity The operational database and data warehouse are kept
separate and thus continuous changes in the operational database are not shown in
the data warehouse.
Subject Oriented
It provides you with important data about a specific subject like suppliers, products,
promotion, customers, etc. Data warehousing usually handles the analysis and
modelling of data that assist any organization to make data-driven decisions.
Applications of Data Warehouses:
Banking Services
Consumer Goods
Manufacturing
Financial Services
Retail Sectors
Benefits of Data Warehousing
The most compelling advantages of data warehousing are:
Improved performance and productivity
Cost-effective
Consistent and accurate data access
What is Data Mining?
In this process, data is extracted and analysed to fetch useful information. In data
mining hidden patterns are researched from the dataset to predict future behaviour.
Data mining is used to indicate and discover relationships through the data.
Data mining uses statistics, artificial intelligence, machine learning systems, and
some databases to find hidden patterns in the data. It supports business-related
queries that are time-consuming to resolve.
Features of Data Mining
Some of the unique features of data mining are:
It is capable of predicting future results
It can efficiently handle large datasets and databases
It can seamlessly utilise the automated discovery of patterns
It has the potential to create actionable insights, etc.
Applications of Data Mining
Research
Education Sector
Transportation
Market Basket Analysis
Business Transactions
Intrusion Detection
Scientific Analysis
Finance and Banking Sector
Insurance and Healthcare
Benefits of Data Mining
Analysing trends within the existing marketplace.
Detecting frauds in phone calls, insurance claims, debit or credit
purchases, etc.
It can make easy predictions within the market before making business
decisions.
Difference Between Data Mining and Data Warehousing
Data Mining Data Warehousing
This procedure involves analysing It is exquisitely designed for analytical
data patterns analysis
Regular data analysis Periodical data storage
Uses pattern recognition logic to Extracts and stores data to enable easy
identify patterns reporting
It is carried out by business users with
It is carried out by engineers
the help of engineers
It helps in extracting data from large
It pools all the relevant data together
data sets
Statistics, AI, Machine Learning, and Integrated, subject-oriented, non-volatile,
Databases are used in data mining and time-varying constitute data
technologies warehouses
Pattern recognition logic is used for It involves extracting and storing data in
determining patterns perfect order to make efficient reporting
Extracts are stores data in an orderly
Employs pattern recognition tools to
format, thereby making reporting faster
help identify the access patterns
and easier
When connected with operational
It helps in creating suggestive patterns
business systems like CRM, it deliberately
of key parameters
adds value to it
Common Tools and Software Used in Data Warehousing and Data Mining
Let’s look at the common tools and software used in data warehousing and data
mining: Some of the popular data warehouse tools are:
Amazon Redshift
Microsoft Azure
Google BigQuery
Snowflake
Micro Focus Vertica
Amazon DynamoDB
Some of the popular data mining tools are:
RapidMiner
MonkeyLearn
IBM SPSS Modeler
Oracle Data Mining
Knime
Weka
Orange
H2O
Apache Mahout
SAS Enterprise Miner
Common Data Mining and Data Warehousing Techniques
Let’s look at the common techniques used in data warehousing vs data mining:
The most common techniques of data mining are:
Association
Clustering
Data Visualisation
Data Cleaning
Machine Learning
Classification
Neural Networks
Prediction
Data Warehousing
Outlier Detection
Common Data Warehousing Techniques
Some of the most common data warehousing techniques are:
Database Compression
Columnar Data Storage
In-Memory Processing
Massive Parallel Processing (MPP)
Scope of Data Mining & Data Warehouse
The scope of data mining vs data warehousing is different from each other. Data
mining involves sorting enormous data sets to identify relationships and patterns
that can easily solve business problems through data analysis. The scope and
techniques of data mining enable enterprises to predict future trends and make
informed business decisions.
On the other hand, the scope of data warehousing lies within any domain that has
something to do with analytics. Now, let us discuss the differences between data
mining and data warehousing challenges faced.
Challenges of Data Mining & Data Warehousing
Some of the most common challenges of data mining vs data warehouse
challenges:
Some of the most common challenges experienced by data mining are:
Incomplete and noisy data
Social and security challenges
Complex data
Distributed data
Efficiency and scalability of algorithms
Performance
Incorporating background knowledge
Improving mining algorithms, etc.
Some of the most common challenges experienced by data warehousing are:
Manual data processing
Data quality
Data Accuracy
Testing
Performance, etc.