3rd module
4.3 DATA WAREHOUDING PROCESS
Data Warehousing Process: Key Components
• Data sources. Data are sourced from multiple independent operational “legacy” systems
and possibly from external data providers (such as the U.S. Census). Data may also come
from an OLTP or enterprise resource planning (ERP) system. Web data in the form of Web
logs may also feed to a data warehouse.
• Data extraction and transformation. Data are extracted and properly trans formed using
custom-written or commercial software called ETL.
• Data loading. Data are loaded into a staging area, where they are transformed and
cleansed. The data are then ready to load into the data warehouse and/or DMs.
Comprehensive Database (EDW - Enterprise Data Warehouse)
1. A large central database that stores all the important business data.
2. It collects data from many sources and organizes it into detailed and summarized information.
3. This helps businesses make better decisions using all available data.
Metadata (Data About Data)
1. Metadata is extra information about the data that helps IT teams and users understand and organize it.
2. It includes rules and descriptions for how data is structured.
3. Metadata makes it easier to find and use data, especially with web-based tools.
Middleware Tools (Connecting Users to Data)
1. Middleware tools help users access the data warehouse.
2. Advanced users (like data analysts) can write SQL queries to retrieve data.
3. Regular business users can use easy-to-use applications like:
1. Reporting tools (for creating reports)
2. Data mining tools (for finding patterns in data)
3. OLAP tools (for analyzing data in different ways)
4. Data visualization tools (for displaying data in charts and graphs)
These components work together to make sure that data is collected, organized, and easily accessible for
better decision-making.
4.3 data warehouse architecture
Three-Tier Architecture
How it works:
o The first layer (tier) contains operational systems (the original data and software for collecting it).
o The second layer is the data warehouse, where data is stored and organized.
o The third layer is the application server, which includes decision support systems (DSS), business intelligence (BI),
and business analytics (BA) tools.
Advantages:
o Better organization: Each layer has a separate role, making the system more efficient.
o Faster analysis: Data is processed into a special multidimensional database for quick analysis.
o More flexibility: Allows creating smaller databases (Data Marts) for specific needs.
2. Two-Tier Architectur
How it works:
o The DSS engine (used for analysis) runs on the same hardware as the data warehouse.
o There is no separate application server.
Advantages:
o Cheaper than the three-tier model because it uses fewer resources.
Disadvantage
o Slower performance for large data warehouses that require heavy data processing.
3. Web-Based Data Warehousing
4.
How it works:
o Uses the Internet to access and analyze data from a data warehouse.
o It has three layers:
Client side: The user accesses the system via a web browser.
Web server: Manages communication between the user and the data warehouse
Application server: Processes data and delivers insiGHT
10 factor's that effects the data warehouse architecture
Several issues must be considered when considered which architecture to use
Model q1 3rd mod 5b
ETL PROCESS
ETL (Extract, Transform, Load) in Simple Words
ETL stands for Extract, Transform, and Load, and it is a key process in managing data for a data warehouse.
This process helps collect data from different sources, clean it, and store it properly for easy access and analysis.
1. Extraction (E)
Extracting (or pulling) data from different sources such as databases, spreadsheets, CRM systems, or even web logs.
Example: Collecting sales data from different branches of a company.
2. Transformation (T)
Cleaning, converting, and organizing the extracted data so that it is consistent and useful.
This includes fixing errors, removing duplicates, and applying business rules.
Example: Changing all date formats to a standard format like YYYY-MM-DD.
3. Load (L)
Storing the transformed data into a data warehouse for analysis and reporting.
Example: After cleaning, the sales data is stored in a database so that managers can analyze sales trends.
Why is ETL Important?
It integrates data from multiple sources.
Ensures data quality by removing errors.
Makes data ready for reporting and analysis.
Saves time by automating data processing.
ETL Tools and Technologies
Instead of writing custom programs, companies use ETL tools to automate the process. Some popular ETL
tools are:
Informatica
Microsoft SSIS (SQL Server Integration Services)
IBM DataStage
Oracle Data Integrator
Talend
Challenges in ETL
Costly: ETL tools can be expensive.
Complexity: Requires skilled IT professionals.
Time-Consuming: ETL takes up to 70% of the time in a data warehouse project.
MODEL Q2 MOD 3 5A
CHARECTERSTICS OF DATA WARE HOUSE
Key Characteristics of a Data Warehouse (In Simple Words)
A data warehouse is a special type of database designed to help businesses analyze data and make better
decisions. Here are its main characteristics:
1. Subject-Orientedtransactions
Data is grouped by topics like sales, customers, or products, making it easier to analyze specific areas of the business.
Unlike regular databases that focus on (e.g., recording a sale), a data warehouse helps understand trends and patterns in
the data.
2. Integrated
Data from different sources (e.g., spreadsheets, databases, and external files) is combined into a single, consistent format.
It resolves issues like naming conflicts (e.g., "customer ID" vs. "client ID") and measurement differences (e.g., "kilograms"
vs. "pounds").
3. Time-Variant (Time-Series Data)
A data warehouse stores historical data to track changes over time.
It helps businesses detect trends, predict future outcomes, and compare data from different time periods (e.g., monthly
sales trends).
4. Non-Volatile (Data Does Not Change)
Once data is added, it cannot be modified or deleted.
If new information is available, it is added as a new record, rather than updating old data.
This helps maintain data integrity and allows users to track changes over time.
Additional Features of a Data Warehouse
✅ Web-Based: Designed to work efficiently with web applications.
✅ Relational/Multidimensional: Uses a relational database (tables with rows and columns) or a
multidimensional database (optimized for fast analysis).
✅ Client/Server Architecture: Supports access from multiple devices (clients) connected to a central server.
✅ Real-Time Data: Some modern warehouses provide instant updates instead of waiting for scheduled
updates.
✅ Includes Metadata: Stores "data about data", explaining how the data is organized and how to use it
effectively.
These characteristics make data warehouses powerful tools for businesses that need accurate, fast, and
organized data analysis.
Differnrce beteween inmon and kimball dataware house process
Representation of data in a data warehouse
Easy Explanation of Data Warehouse Structure
A data warehouse stores large amounts of data for analysis. It is designed to handle complex queries quickly
and efficiently using a method called dimensional modeling. This helps organize data in a way that makes it
easy to analyze.
Two common ways to organize data in a warehouse are:
Star Schema ⭐
1. The simplest and most common way to structure a data warehouse.
2. It has a fact table in the center, which stores key business data like sales, profits, or costs.
3. Surrounding the fact table are dimension tables, which contain details about the data, such as product names,
customer details, or locations.
4. Helps in quickly retrieving and analyzing data.
Snowflake Schema
1. Similar to the star schema but more detailed.
2. The dimension tables are broken down into smaller tables to remove duplicate data (this is called normalization).
3. Helps save storage space but can be more complex to manage.
Both schemas help businesses analyze data and make decisions efficiently.
Business reporting
Easy Explanation of Business Report
Metric Management Reports
1. Businesses use specific numbers (metrics) to measure their performance.
2. For customers, these are called service-level agreements (SLAs), which define the quality of service they should
get.
3. For internal teams, they use key performance indicators (KPIs) to track progress.
4. These reports help companies stay on track with their goals and can be part of improvement methods like Six
Sigma or Total Quality Management (TQM)
Dashboard Reports
1. A dashboard displays different performance indicators on a single page, just like a car dashboard shows speed,
fuel, and other details.
2. Companies use dashboards to quickly see how their business is doing.
3. Dashboards often use color-coded indicators ( red = bad, orange = needs attention, green = good) to highlight
important areas.
Balanced Scorecard Reports
1. This method, created by Kaplan and Norton, gives a complete view of a company’s success.
2. Instead of only looking at financial data, it also considers:
1. Customer satisfaction
2. Business processes (how efficiently things are done)
3. Learning & growth (employee development and innovation)
These reports help businesses track performance and make better decisions. ✅