KEMBAR78
Chapter 1 | PDF | Data Warehouse | Databases
0% found this document useful (0 votes)
7 views11 pages

Chapter 1

Data is a collection of raw facts and figures that can be qualitative or quantitative, while information is processed data that provides context for decision-making. Data warehouses store large amounts of historical data from various sources for analysis, improving business intelligence and decision-making capabilities. Key features of data warehousing include centralized data repositories, data integration, and robust query performance, although challenges such as cost and complexity exist.

Uploaded by

Rasila Walhekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views11 pages

Chapter 1

Data is a collection of raw facts and figures that can be qualitative or quantitative, while information is processed data that provides context for decision-making. Data warehouses store large amounts of historical data from various sources for analysis, improving business intelligence and decision-making capabilities. Key features of data warehousing include centralized data repositories, data integration, and robust query performance, although challenges such as cost and complexity exist.

Uploaded by

Rasila Walhekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

What is Data?

Data is a collection of raw, unorganized facts and details like text, observations,
figures, symbols and descriptions of things etc.

OR

“Data is a collection of facts and figure that can be recorded; it can be in text,
number, speech, video, and image. Database means a huge amount of inter-
related data is stored, retrieved and collect at one place in the database; In short,
it is a collection of inter-related data stored in the database. Management is a
collection of the program for security manages, retrieved and stored the data.”

For example 89 is the data.

What are the different types of data?

Data can be of two types:

● Qualitative data: It is non-numerical data. For eg., the texture of the


skin, the colour of the eyes, etc.
● Quantitative data: Quantitative data is given in numbers. Data in the
form of questions such as “how much” and “how many”, gives the
quantitative data.

What is Information?

Information is processed, organised and structured data. It provides context for


data and enables decision making. For example, a single customer’s sale at a
restaurant is data – this becomes information when the business is able to
identify the most popular or least popular dish.

Difference between Data and Information

Data Information

Data is unorganised and unrefined Information comprises processed,


facts organised data presented in a
meaningful context

Data doesn’t depend on information. Information depends on data.


Raw data alone is insufficient for Information is sufficient for decision
decision making making

An example of data is a student’s test The average score of a class is the


score information derived from the given
data.

A Database Management System (DBMS) stores data in the form of tables and
uses an ER model and the goal is ACID properties. For example, a DBMS of a
college has tables for students, faculty, etc.
A Data Warehouse is separate from DBMS, it stores a huge amount of data,
which is typically collected from multiple heterogeneous sources like files,
DBMS, etc. The goal is to produce statistical results that may help in decision-
making. For example, a college might want to see quick different results, like
how the placement of CS students has improved over the last 10 years, in terms
of salaries, counts, etc.
Issues Occur while Building the Warehouse
● When and how to gather data: In a source-driven architecture for
gathering data, the data sources transmit new information, either continually
(as transaction processing takes place), or periodically (nightly, for
example). In a destination-driven architecture, the data warehouse
periodically sends requests for new data to the sources. Unless updates at the
sources are replicated at the warehouse via two phase commit, the warehouse
will never be quite up to-date with the sources. Two-phase commit is usually
far too expensive to be an option, so data warehouses typically have slightly
out-of-date data. That, however, is usually not a problem for decision-
support systems.
● What schema to use: Data sources that have been constructed
independently are likely to have different schemas. In fact, they may even
use different data models. Part of the task of a warehouse is to perform
schema integration, and to convert data to the integrated schema before they
are stored. As a result, the data stored in the warehouse are not just a copy of
the data at the sources. Instead, they can be thought of as a materialized view
of the data at the sources.
● Data transformation and cleansing: The task of correcting and
preprocessing data is called data cleansing. Data sources often deliver data
with numerous minor inconsistencies, which can be corrected. For example,
names are often misspelled, and addresses may have street, area, or city
names misspelled, or postal codes entered incorrectly. These can be
corrected to a reasonable extent by consulting a database of street names and
postal codes in each city. The approximate matching of data required for this
task is referred to as fuzzy lookup.
● How to propagate update: Updates on relations at the data sources must be
propagated to the data warehouse. If the relations at the data warehouse are
exactly the same as those at the data source, the propagation is
straightforward. If they are not, the problem of propagating updates is
basically the view-maintenance problem.
● What data to summarize: The raw data generated by a transaction-
processing system may be too large to store online. However, we can answer
many queries by maintaining just summary data obtained by aggregation on
a relation, rather than maintaining the entire relation. For example, instead of
storing data about every sale of clothing, we can store total sales of clothing
by item name and category.
Need for Data Warehouse
1.An ordinary Database can store MBs to GBs of data and that too for a specific
purpose. For storing data of TB size, the storage shifted to the Data Warehouse.
2. a transactional database doesn’t offer itself to analytics.
3.To effectively perform analytics, an organization keeps a central Data
Warehouse to closely study its business by organizing, understanding, and using
its historical data for making strategic decisions and analyzing trends.
Benefits of Data Warehouse
● Better business analytics: Data warehouse plays an important role in every
business to store and analysis of all the past data and records of the
company. which can further increase the understanding or analysis of data
for the company.
● Faster Queries: The data warehouse is designed to handle large queries
that’s why it runs queries faster than the database.
● Improved data Quality: In the data warehouse the data you gathered from
different sources is being stored and analyzed it does not interfere with or
add data by itself so your quality of data is maintained and if you get any
issue regarding data quality then the data warehouse team will solve this.
● Historical Insight: The warehouse stores all your historical data which
contains details about the business so that one can analyze it at any time and
extract insights from it.
Data Warehouse vs DBMS
Database Data Warehouse

A common Database is based on


operational or transactional A data Warehouse is based on
processing. Each operation is an analytical processing.
indivisible transaction.

A Data Warehouse maintains


historical data over time. Historical
Generally, a Database stores current
data is the data kept over years and
and up-to-date data which is used for
can used for trend analysis, make
daily operations.
future predictions and decision
support.

A Data Warehouse is integrated


generally at the organization level, by
A database is generally application combining data from different
specific. databases.
Example – A database stores related Example – A data warehouse
data, such as the student details in a integrates the data from one or more
school. databases , so that analysis can be
done to get results , such as the best
performing school in a city.

Constructing a Database is not so Constructing a Data Warehouse can


expensive. be expensive.
Example Applications of Data Warehousing
Data Warehousing can be applied anywhere where we have a huge amount of
data and we want to see statistical results that help in decision making.
● Social Media Websites: The social networking websites like Facebook,
Twitter, Linkedin, etc. are based on analyzing large data sets. These sites
gather data related to members, groups, locations, etc., and store it in a single
central repository. Being a large amount of data, Data Warehouse is needed
for implementing the same.
● Banking: Most of the banks these days use warehouses to see the spending
patterns of account/cardholders. They use this to provide them with special
offers, deals, etc.
● Government: Government uses a data warehouse to store and analyze tax
payments which are used to detect tax thefts.
Features of Data Warehousing
Data warehousing is essential for modern data management, providing a strong
foundation for organizations to consolidate and analyze data strategically. Its
distinguishing features empower businesses with the tools to make informed
decisions and extract valuable insights from their data.
● Centralized Data Repository: Data warehousing provides a centralized
repository for all enterprise data from various sources, such as transactional
databases, operational systems, and external sources. This enables
organizations to have a comprehensive view of their data, which can help in
making informed business decisions.
● Data Integration: Data warehousing integrates data from different sources
into a single, unified view, which can help in eliminating data silos and
reducing data inconsistencies.
● Historical Data Storage: Data warehousing stores historical data, which
enables organizations to analyze data trends over time. This can help in
identifying patterns and anomalies in the data, which can be used to improve
business performance.
● Query and Analysis: Data warehousing provides powerful query and
analysis capabilities that enable users to explore and analyze data in different
ways. This can help in identifying patterns and trends, and can also help in
making informed business decisions.
● Data Transformation: Data warehousing includes a process of data
transformation, which involves cleaning, filtering, and formatting data from
various sources to make it consistent and usable. This can help in improving
data quality and reducing data inconsistencies.
● Data Mining: Data warehousing provides data mining capabilities, which
enable organizations to discover hidden patterns and relationships in their
data. This can help in identifying new opportunities, predicting future trends,
and mitigating risks.
● Data Security: Data warehousing provides robust data security features,
such as access controls, data encryption, and data backups, which ensure that
the data is secure and protected from unauthorized access.
Advantages of Data Warehousing
● Intelligent Decision-Making: With centralized data in warehouses,
decisions may be made more quickly and intelligently.
● Business Intelligence: Provides strong operational insights through business
intelligence.
● Historical Analysis: Predictions and trend analysis are made easier by
storing past data.
● Data Quality: Guarantees data quality and consistency for trustworthy
reporting.
● Scalability: Capable of managing massive data volumes and expanding to
meet changing requirements.
● Effective Queries: Fast and effective data retrieval is made possible by an
optimized structure.
● Cost reductions: Data warehousing can result in cost savings over time by
reducing data management procedures and increasing overall efficiency,
even when there are setup costs initially.
● Data security: Data warehouses employ security protocols to safeguard
confidential information, guaranteeing that only authorized personnel are
granted access to certain data.
Disadvantages of Data Warehousing
● Cost: Building a data warehouse can be expensive, requiring significant
investments in hardware, software, and personnel.
● Complexity: Data warehousing can be complex, and businesses may need to
hire specialized personnel to manage the system.
● Time-consuming: Building a data warehouse can take a significant amount
of time, requiring businesses to be patient and committed to the process.
● Data integration challenges: Data from different sources can be
challenging to integrate, requiring significant effort to ensure consistency
and accuracy.
● Data security: Data warehousing can pose data security risks, and
businesses must take measures to protect sensitive data from unauthorized
access or breaches.

Characteristics of Data Warehouse

Subject-Oriented

A data warehouse target on the modeling and analysis of data for decision-
makers. Therefore, data warehouses typically provide a concise and
straightforward view around a particular subject, such as customer, product, or
sales, instead of the global organization's ongoing operations. This is done by
excluding data that are not useful concerning the subject and including all data
needed by the users to understand the subject.

Integrated

A data warehouse integrates various heterogeneous data sources like RDBMS,


flat files, and online transaction records. It requires performing data cleaning
and integration during data warehousing to ensure consistency in naming
conventions, attributes types, etc., among different data sources.

Time-Variant

Historical information is kept in a data warehouse. For example, one can


retrieve files from 3 months, 6 months, 12 months, or even previous data from a
data warehouse. These variations with a transactions system, where often only
the most current file is kept.

Non-Volatile

The data warehouse is a physically separate data storage, which is transformed


from the source operational RDBMS. The operational updates of data do not
occur in the data warehouse, i.e., update, insert, and delete operations are not
performed. It usually requires only two procedures in data accessing: Initial
loading of data and access to data. Therefore, the DW does not require
transaction processing, recovery, and concurrency capabilities, which allows for
substantial speedup of data retrieval. Non-Volatile defines that once entered into
the warehouse, and data should not change.

Three-Tier Data Warehouse Architecture


Data Warehouses usually have a three-level (tier) architecture that includes:
1. Bottom Tier (Data Warehouse Server)
2. Middle Tier (OLAP Server)
3. Top Tier (Front end Tools).

A bottom-tier that consists of the Data Warehouse server, which is almost


always an RDBMS. It may include several specialized data marts and a
metadata repository.

Data from operational databases and external sources (such as user profile data
provided by external consultants) are extracted using application program
interfaces called a gateway. A gateway is provided by the underlying DBMS
and allows customer programs to generate SQL code to be executed at a server.

Examples of gateways contain ODBC (Open Database Connection) and OLE-


DB (Open-Linking and Embedding for Databases), by Microsoft,
and JDBC (Java Database Connection).

A middle-tier which consists of an OLAP server for fast querying of the data
warehouse.

The OLAP server is implemented using either

(1) A Relational OLAP (ROLAP) model, i.e., an extended relational DBMS


that maps functions on multidimensional data to standard relational operations.

(2) A Multidimensional OLAP (MOLAP) model, i.e., a particular purpose


server that directly implements multidimensional information and operations.

A top-tier that contains front-end tools for displaying results provided by


OLAP, as well as additional tools for data mining of the OLAP-generated data.

The overall Data Warehouse Architecture is shown in fig:


The metadata repository stores information that defines DW objects. It
includes the following parameters and information for the middle and the top-
tier applications:

1. A description of the DW structure, including the warehouse schema,


dimension, hierarchies, data mart locations, and contents, etc.
2. Operational metadata, which usually describes the currency level of the
stored data, i.e., active, archived or purged, and warehouse monitoring
information, i.e., usage statistics, error reports, audit, etc.
3. System performance data, which includes indices, used to improve data
access and retrieval performance.
4. Information about the mapping from operational databases, which
provides source RDBMSs and their contents, cleaning and transformation
rules, etc.
5. Summarization algorithms, predefined queries, and reports business data,
which include business terms and definitions, ownership information, etc.

Principles of Data Warehousing


Load Performance

Data warehouses require increase loading of new data periodically basis within
narrow time windows; performance on the load process should be measured in
hundreds of millions of rows and gigabytes per hour and must not artificially
constrain the volume of data business.

Load Processing

Many phases must be taken to load new or update data into the data warehouse,
including data conversion, filtering, reformatting, indexing, and metadata
update.

Data Quality Management

Fact-based management demands the highest data quality. The warehouse


ensures local consistency, global consistency, and referential integrity despite
"dirty" sources and massive database size.

Query Performance

Fact-based management must not be slowed by the performance of the data


warehouse RDBMS; large, complex queries must be complete in seconds, not
days.

Terabyte Scalability
Data warehouse sizes are growing at astonishing rates. Today these size from a
few to hundreds of gigabytes and terabyte-sized data warehouses.

You might also like