KEMBAR78
DW Questions | PDF | Data Warehouse | Databases
0% found this document useful (1 vote)
838 views35 pages

DW Questions

A data warehouse is a repository that stores historical data from multiple sources to support analysis and decision making. It typically involves extracting, transforming and loading data from operational systems on a regular basis. The basic components of a data warehouse are fact tables, which contain measures, and dimension tables, which provide context about the facts. A star schema organizes these tables with each dimension connected directly to the fact table. A snowflake schema extends the star schema by introducing hierarchies between dimension tables. OLAP cubes allow analyzing data across multiple dimensions through operations like slicing, dicing, roll-up and drill-down.

Uploaded by

Souvik Banerjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
838 views35 pages

DW Questions

A data warehouse is a repository that stores historical data from multiple sources to support analysis and decision making. It typically involves extracting, transforming and loading data from operational systems on a regular basis. The basic components of a data warehouse are fact tables, which contain measures, and dimension tables, which provide context about the facts. A star schema organizes these tables with each dimension connected directly to the fact table. A snowflake schema extends the star schema by introducing hierarchies between dimension tables. OLAP cubes allow analyzing data across multiple dimensions through operations like slicing, dicing, roll-up and drill-down.

Uploaded by

Souvik Banerjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 35

1.

Data scrubbing is which of the following?


A
A process to reject data from the data warehouse and to create the necessary indexes
.
B.A process to load the data in the data warehouse and to create the necessary indexes
C.A process to upgrade the quality of data after it is moved into a data warehouse
D
A process to upgrade the quality of data before it is moved into a data warehouse
.
Answer: Option D
2.
The active data warehouse architecture includes which of the following?
A
At least one data mart
.
B.Data that can extracted from numerous internal and external sources
C.Near real-time updates
D
All of the above.
.
Answer: Option D
3.
A goal of data mining includes which of the following?
A
To explain some observed event or condition
.
B.To confirm that data exists
C.To analyze data for expected relationships
D
To create a new data warehouse
.
Answer: Option A
4.
An operational system is which of the following?
A
A system that is used to run the business in real time and is based on historical data.
.
B.A system that is used to run the business in real time and is based on current data.
C.A system that is used to support decision making and is based on current data.

D
A system that is used to support decision making and is based on historical data.
.
Answer: Option B
5.
A data warehouse is which of the following?
A
Can be updated by end users.
.
B.Contains numerous naming conventions and formats.
C.Organized around important subject areas.
D
Contains only current data.
.
Answer: Option C

6.
A snowflake schema is which of the following types of tables?
A
Fact
.
B.Dimension
C.Helper
D
All of the above
.
Answer: Option D
7.
The generic two-level data warehouse architecture includes which of the following?
A
At least one data mart
.
B.Data that can extracted from numerous internal and external sources
C.Near real-time updates
D
All of the above.
.
Answer: Option B

8.
Fact tables are which of the following?
A
Completely denormalized
.
B.Partially denormalized
C.Completely normalized
D
Partially normalized
.
Answer: Option C
9.
Data transformation includes which of the following?
A
A process to change data from a detailed level to a summary level
.
B.A process to change data from a summary level to a detailed level
C.Joining data from one source into various sources of data
D
Separating data from one source into various sources of data
.
Answer: Option A
10.
Reconciled data is which of the following?
A
Data stored in the various operational systems throughout the organization.
.
B.Current data intended to be the single source for all decision support systems.
C.Data stored in one operational system in the organization.
D
Data that has been selected and formatted for end-user support applications.
.
Answer: Option B

11.
The load and index is which of the following?
A
A process to reject data from the data warehouse and to create the necessary indexes
.
B.A process to load the data in the data warehouse and to create the necessary indexes
C.A process to upgrade the quality of data after it is moved into a data warehouse

D
A process to upgrade the quality of data before it is moved into a data warehouse
.
Answer: Option B
12.
The extract process is which of the following?
A
Capturing all of the data contained in various operational systems
.
B.Capturing a subset of the data contained in various operational systems
C.Capturing all of the data contained in various decision support systems
D
Capturing a subset of the data contained in various decision support systems
.
Answer: Option B
13.
A star schema has what type of relationship between a dimension and fact table?
A
Many-to-many
.
B.One-to-one
C.One-to-many
D
All of the above.
.
Answer: Option C
14.
Transient data is which of the following?
A Data in which changes to existing records cause the previous version of the records to be
. eliminated
Data in which changes to existing records do not cause the previous version of the records
B.
to be eliminated
C.Data that are never altered or deleted once they have been added
D
Data that are never deleted once they have been added
.

Answer: Option A
15.
A multifield transformation does which of the following?
A
Converts data from one field into multiple fields
.
B.Converts data from multiple fields into one field
C.Converts data from multiple fields into multiple fields
D
All of the above
.
Answer: Option D

1.
A data mart is designed to optimize the performance for well-defined and predicable uses.
A
True
.

B.False

Answer: Option A

2.
Successful data warehousing requires that a formal program in total quality management
(TQM) be implemented.
A
True
.
Answer: Option A

B.False

3.
Data in operational systems are typically fragmented and inconsistent.
A
True
.

B.False

Answer: Option A

4.
Most operational systems are based on the use of transient data.
A
True
.

B.False

Answer: Option A

5.
Independent data marts are often created because an organization focuses on a series of shortterm business objectives.
A
True
.

B.False

Answer: Option A
6.
Joining is the process of partitioning data according to predefined criteria.
A
True
.

B.False

Answer: Option B

7.
The role of the ETL process is to identify erroneous data and to fix them.
A
True
.

B.False

Answer: Option B

8.
Data in the data warehouse are loaded and refreshed from operational systems.
A
True
.
Answer: Option A

B.False

9.
Star schema is suited to online transaction processing, and therefore is generally used in
operational systems, operational data stores, or an EDW.
A
True
.

B.False

Answer: Option B
10.
Periodic data are data that are physically altered once added to the store.
A
True
.

B.False

Answer: Option B

11.
Both status data and event data can be stored in a database.
A
True
.

B.False

Answer: Option A

12.
Static extract is used for ongoing warehouse maintenance.
A
True
.

B.False

Answer: Option B
13.
Data scrubbing can help upgrade data quality; it is not a long-term solution to the data quality

problem.
A
True
.

B.False

Answer: Option A

14.
Every key used to join the fact table with a dimensional table should be a surrogate key.
A
True
.

B.False

Answer: Option A

15.
Derived data are detailed, current data intended to be the single, authoritative source for all
decision support applications.
A
True
.

B.False

Answer: Option B

What is a data warehouse, and why is it used?


A data warehouse is a repository of data. The pieces of information stored are relevant to each
other and support the decision making tree of some corporation or entity. It can incorporate
multiple data sources to store all the data connected to the subject. Typically it is composed by

archive or historical data that can be analyzed. A data warehouse is supported on a database
system.
What are the basic stages of a data warehouse?
The first stage to build a data warehouse is the initial data introduction, typically this can be
achieved by copying some operational database. This is called and offline operational database.
Then, we have to feed new sets of data to the newest created data warehouse. Therefore, this
database is updated with large sets of data in a regular time basis (week, month). With this step,
weve successfully built an offline data warehouse.
To achieve a Real-time data warehouse you have to insert the operational data in real time. When
this is integrated with the application, reporting on the data, its called an integrated data
warehouse.
What is OLAP and OLTP, and which are their main differences?
OLAP performs the analysis on the data, reporting the information. The focus on these kind of
systems is the reading of data, thus using the SELECT database statement. OLTP manages the
transaction system that collects the data. Actions like INSERT, UPDATE or DELETE are the
focus here.
This topic is covered in a much more detailed way in this OLTP and OLAP article, provided by
us.
What is a fact table?
The fact table is a concrete measure that is typically stored as numeric values, they have the core
business information.
In detail, the fact table contains two different kinds of information. The foreign keys to the
related dimension tables, providing joining relationships, and the measure columns which
represent the added data.
And a dimension table?
Dimension tables describe the quantified data on the fact tables, giving context on its fields. They
contain descriptive attributes which provide more information related to the fact table.
Fact tables have foreign keys to the dimension ones and the relation is one to many.
Describe the star schema.
In the star schema we have a centralized fact table and multiple dimensions linked to it. These
dimensions are only related to the fact table, so the only link they have is to that specific table.

The fact table relates to the dimensions having their primary keys as foreign keys, and other
extra attributes relevant to the data warehouse.
Therefore this kind of schema is DE normalized and better for simple queries, which are usually
faster.
The next diagram represents a simple star schema implementation.

Describe the snowflake schema.


The Snowflake schema have links and relationships between dimensions, becoming a normalized
organization of tables: fact and dimensions. This type of schema is usually more complex
because each dimension can be composed of many other dimensions.
This kind of organization is explain in the next schema.

What is an OLAP cube?


An OLAP data cube is a representation of data in multiple dimensions, using facts and
dimensions. It is characterized by the combination of information according to its relationship.
It can consist in a collection of 0 to many dimensions, representing specific data. There are five
basic operation to perform on these kind of data cubes:

Slicing

Dicing

Roll-Up

Drill-Up and Drill-Down

Pivoting
Explain the slicing operation.

The slicing operation on an OLAP Cube establishes a single value for one of the dimensions of
the cube, selecting all the data that corresponds to the selected value.
So, by executing a slice on the cube we get all the selected dimension and fact information for
the specific value assigned.
Explain the dicing operation
Dicing on OLAP Cubes consists on choosing an interval of values for some of the dimensions
representing in the cube, and selecting the data that corresponds to those intervals.
This operation creates a subset of the cube which contains the data between the intervals.
Explain the roll up operation.
The roll-up operation performs some computing rules on the data of an OLAP cube specific
dimension, returning the computed information to the end user.
These applied rules can be defined and summarize the information on that specific dimension.
Explain the drill-up/drill-down operation
These operations allow the exploration of information between the levels of data presented on
dimensions and facts on the data warehouse.
It can select summarized information or the details that compose that data aggregation.

Explain the pivoting operation


Pivoting allows the rotation of the cube on its dimensions providing the user a different point of
view of the explored data.
The cube can be rotated on every face.
Explain the concept of data mart.
Data mart is a specific group of data linked to a subject, which is part of a specific data
warehouse. Therefore, a data warehouse have multiple data marts.
Basically a data mart is a small data warehouse with condensed information about a specific
subject and its relationships. Usually each data mart is related to a department, business unit or
something that can function individually within a data warehouse.
Which are the reasons to create a Data Mart?
There are various reasons that lead to a creation of a data mart. The most important ones are:

Create a data specific environment, providing easy access to it

Easy to create

Data is more relevant to users having only the essential information

Lower cost than creating a whole data warehouse


What does Normalization mean?

Normalization is the process in which tables and fields are organized in a database in order to
reduce the redundancy of stored data. Therefore many relationships between tables are defined,
providing a better organized database system.
The key benefits of normalization are:

Low database data redundancy

Searching and indexing is faster

Fewer null values since data is well distributed

Cleaner and easier to maintain


What is an ETL process?

An ETL process consists on getting data from different sources and converting it to enter in a
specific data warehouse.
These processes transform and normalize the data, providing a common base for all sources to
integrate with a data warehouse.
What is aggregation?
Aggregation is the representation of a set of data, joined by some aggregation function.
This functions may be simple or complex depending of the purpose of the selected aggregation
data. A simple function is the sum of every value.
Explain what is partitioning.
Partitioning is the process of dividing all data warehouse elements into smaller and distinct sets
of data, keeping the relationships between the elements.
The benefits of partitioning are:

Easy management

Better performance

Availability

Easier backup and recovery


What types of dimensions do you know?

There are four common kinds of dimensions in a data warehouse:

Conformed Dimension

Degenerated Dimension

Role-Playing Dimension

Junk Dimension
Describe a conformed dimension.

A conformed dimension is shared between various subjects in the data warehouse. Therefore it is
widely used in different contexts, meaning the same thing in each one of them.

Explain what a degenerated dimension is.


The degenerated dimension is derived from a fact table and doesnt have its own dimension
table.
What is a role-playing dimension?
A role-playing dimension has multiple applications within the same Data Warehouse and it is
reused for different purposes. One example is an ID. In a data warehouse we can have several
kinds of IDs: client id, product id, etc.
What are junk dimensions?
Junk dimensions are composed by some attributes that dont fit in another tables and are usually
used with rapidly changing dimensions.
What is the difference between metadata and data dictionary?
A data dictionary has all the definitions of a database, the tables and fields, rows, number of
rows, and that kind of information.
Metadata describes some kind of information with additional and important data which is
complementary.
1. The full form of OLAP is
A) Online Analytical Processing

2. ......................... is a subject-oriented, integrated, time-variant, nonvolatile


collection or data in support of management decisions.
B) Data Warehousing

3. The data is stored, retrieved and updated in ....................


B) OLTP

4. An .................. system is market-oriented and is used for data analysis by


knowledge workers, including managers, executives, and analysts.
A) OLAP

5. ........................ is a good alternative to the star schema.

C) Fact constellation

6. The ............................ exposes the information being captured, stored, and


managed by operational systems.
C) data source view

7. The type of relationship in star schema is ...............


C) one to many

8. The .................. allows the selection of the relevant information necessary


for the data warehouse.
A) top-down view

9. Which of the following is not a component of a data warehouse?


D) Component Key

10. Which of the following is not a kind of data warehouse application?


D) Transaction processing

1. Data warehouse architecture is based on .........................


B) RDBMS
2. .......................... supports basic OLAP operations, including slice and dice, drill-down, roll-up
and pivoting.
B) Analytical processing
3. The core of the multidimensional model is the ....................... , which consists of a large set of
facts and a number of dimensions.

C) Data cube

4. The data from the operational environment enter ........................ of data warehouse.
A) Current detail data
5. A data warehouse is ......................
C) organized around important subject areas
6. Business Intelligence and data warehousing is used for ..............
D) All of the above
7. Data warehouse contains ................ data that is never found in the operational environment.
C) summary
8. ................... are responsible for running queries and reports against data warehouse tables.
C) End users

9. The biggest drawback of the level indicator in the classic star schema is that is limits ............
A) flexibility
10. ............................. are designed to overcome any limitations placed on the warehouse by the
nature of the relational data model.
C) Multidimensional database

2. Data that can be modeled as dimension attributes and measure attributes are called _______
data.

a) Multidimensional
b) Singledimensional
c) Measured
d) Dimensional
Answer:a
Explanation:Given a relation used for data analysis, we can identify some of its attributes as
measure attributes, since they measure some value, and can be aggregated upon.Dimension
attribute define the dimensions on which measure attributes, and summaries of measure
attributes, are viewed.
3. The generalization of cross-tab which is represented visually is ____________ which is also
called as data cube.
a) Two dimensional cube
b) Multidimensional cube
c) N-dimensional cube
d) Cuboid
Answer:a
Explanation:Each cell in the cube is identified for the values for the three dimensional attributes.
4. The process of viewing the cross-tab (Single dimensional) with a fixed value of one attribute is
a) Slicing
b) Dicing
c) Pivoting
d) Both a and b
Answer:d
Explanation:For eg., The item name and colour is viewed for a fixed size.
5. The operation of moving from finer-granularity data to a coarser granularity (by means of
aggregation) is called a ________.
a) Rollup
b) Drill down
c) Dicing
d) Pivoting
Answer:a
Explanation:The opposite operationthat of moving fromcoarser-granularity data to finergranularity datais called a drill down.

6. In SQL the cross-tabs are created using


a) Slice
b) Dice
c) Pivot
d) All of the mentioned
Answer:a
Explanation:pivot (sum(quantity) for color in (dark,pastel,white)) .
7. { (item name, color, clothes size), (item name, color), (item name, clothes size), (color, clothes
size), (item name), (color), (clothes size), () }
This can be achieved by using which of the following ?
a) group by rollup
b) group by cubic
c) group by
d) None of the mentioned
Answer:d
Explanation:Group by cube is used
8. What do data warehouses support?
a) OLAP
b) OLTP
c) OLAP and OLTP
d) Operational databases
Answer:a
Explanation:None .
9. Select item name, color, clothes size, sum(quantity)
from sales
group by rollup(item name, color, clothes size);
How many grouping is possible in this rollup?
a) 8
b) 4
c) 2
d) 1
Answer:b
Explanation:{ (item name, color, clothes size), (item name, color), (item name), () } .

10. Which one of the following is the right syntax for DECODE ?
a) DECODE (search, expression, result [, search, result] [, default])
b) DECODE (expression, result [, search, result] [, default], search)
c) DECODE (search, result [, search, result] [, default], expression)
d) DECODE (expression, search, result [, search, result] [, default])
Answer:d
Explanation:None

Three-Tier Data Warehouse Architecture


Generally a data warehouses adopts a three-tier architecture. Following are the three tiers of the
data warehouse architecture.

Bottom Tier - The bottom tier of the architecture is the data warehouse database server.
It is the relational database system. We use the back end tools and utilities to feed data
into the bottom tier. These back end tools and utilities perform the Extract, Clean, Load,
and refresh functions.

Middle Tier - In the middle tier, we have the OLAP Server that can be implemented in
either of the following ways.
o By Relational OLAP (ROLAP), which is an extended relational database
management system. The ROLAP maps the operations on multidimensional data
to standard relational operations.
o By Multidimensional OLAP (MOLAP) model, which directly implements the
multidimensional data and operations.

Top-Tier - This tier is the front-end client layer. This layer holds the query tools and
reporting tools, analysis tools and data mining tools.

The following diagram depicts the three-tier architecture of data warehouse:

Data Warehouse Models


From the perspective of data warehouse architecture, we have the following data warehouse
models:

Virtual Warehouse

Data mart

Enterprise Warehouse

Virtual Warehouse
The view over an operational data warehouse is known as a virtual warehouse. It is easy to build
a virtual warehouse. Building a virtual warehouse requires excess capacity on operational
database servers.
Data Mart
Data mart contains a subset of organization-wide data. This subset of data is valuable to specific
groups of an organization.

In other words, we can claim that data marts contain data specific to a particular group. For
example, the marketing data mart may contain data related to items, customers, and sales. Data
marts are confined to subjects.
Points to remember about data marts:

Window-based or Unix/Linux-based servers are used to implement data marts. They are
implemented on low-cost servers.

The implementation data mart cycles is measured in short periods of time, i.e., in weeks
rather than months or years.

The life cycle of a data mart may be complex in long run, if its planning and design are
not organization-wide.

Data marts are small in size.

Data marts are customized by department.

The source of a data mart is departmentally structured data warehouse.

Data mart are flexible.

Enterprise Warehouse
An enterprise warehouse collects all the information and the subjects spanning an entire
organization

It provides us enterprise-wide data integration.

The data is integrated from operational systems and external information providers.

This information can vary from a few gigabytes to hundreds of gigabytes, terabytes or
beyond.

Cubes in a data warehouse are stored in three different modes. A relational storage
model is called Relational Online Analytical Processing mode or ROLAP, while a
Multidimensional Online Analytical processing mode is called MOLAP. When
dimensions are stored in a combination of the two modes then it is known as Hybrid
Online Analytical Processing mode or HOLAP.
MOLAP
This is the traditional mode in OLAP analysis. In MOLAP data is stored in form of

multidimensional cubes and not in relational databases. The advantages of this


mode is that it provides excellent query performance and the cubes are built for fast
data retrieval. All calculations are pre-generated when the cube is created and can
be easily applied while querying data. The disadvantages of this model are that it
can handle only a limited amount of data. Since all calculations have been pre-built
when the cube was created, the cube cannot be derived from a large volume of
data. This deficiency can be bypassed by including only summary level calculations
while constructing the cube. This model also requires huge additional investment as
cube technology is proprietary and the knowledge base may not exist in the
organization.

ROLAP
The underlying data in this model is stored in relational databases. Since the data is
stored in relational databases this model gives the appearance of traditional
OLAPs slicing and dicing functionality. The advantages of this model is it can
handle a large amount of data and can leverage all the functionalities of the
relational database. The disadvantages are that the performance is slow and each
ROLAP report is an SQL query with all the limitations of the genre. It is also limited
by SQL functionalities. ROLAP vendors have tried to mitigate this problem by
building into the tool out-of-the-box complex functions as well as providing the users
with an ability to define their own functions.

HOLAP
HOLAP technology tries to combine the strengths of the above two models. For
summary type information HOLAP leverages cube technology and for drilling down
into details it uses the ROLAP model.
Comparing the use of MOLAP, HOLAP and ROLAP
The type of storage medium impacts on cube processing time, cube storage and
cube browsing speed. Some of the factors that affect MOLAP storage are:
Cube browsing is the fastest when using MOLAP. This is so even in cases where no
aggregations have been done. The data is stored in a compressed multidimensional
format and can be accessed quickly than in the relational database. Browsing is
very slow in ROLAP about the same in HOLAP. Processing time is slower in ROLAP,
especially at higher levels of aggregation.
MOLAP storage takes up more space than HOLAP as data is copied and at very low
levels of aggregation it takes up more room than ROLAP. ROLAP takes almost no
storage space as data is not duplicated. However ROALP aggregations take up more
space than MOLAP or HOLAP aggregations.

All data is stored in the cube in MOLAP and data can be viewed even when the
original data source is not available. In ROLAP data cannot be viewed unless
connected to the data source.
MOLAP can handle very limited data only as all data is stored in the cube.

1) What does the term 'Ad-hoc Analysis' mean?


Choice 1 Business analysts use a subset of the data for analysis.
Choice 2: Business analysts access the Data Warehouse data infrequently.
Choice 3: Business analysts access the Data Warehouse data from different locations.
Choice 4: Business analysts do not know data requirements prior to beginning work.
Choice 5: Business analysts use sampling techniques.
2) What should be the business analyst's involvement in monitoring the performance of
a
Data
Warehouse or Data Mart
?
Choice 1: Be patient when load monitoring on the Data Warehouse or Data Mart is
taking place.
Choice 2: Become experts in SQL queries.
Choice 3: No involvement in performance monitoring.
Choice 4: Contact IT if a query takes too long or does not complete.
Choice 5: Complete all required training on the query tools they will be using
3) What factor heavily influences data warehouse size estimates?
Choice 1: The design of the warehouse schemas
Choice 2: The size of the source system schemas
Choice 3: The record size of the source tables
Choice 4: The number of expected data warehouse users
Choice 5: The number of customers an organization has Data warehouses or data
marts allow
organizations to
define 'alert' conditions -- an alert is raised when something noteworthy has taken place.
For
implementing a
facility of 'alerts',
4) what is the advantage of using a WEB interface over a client/server approach?
Choice 1: Access to the 'Alert' report is possible through a highly accessible means
already
available within the
organization.
Choice 2: The selection criteria used in determining when an 'alert' needs to be issued
is easier to
implement
using a WEB browser.

Choice 3: As long as the appropriate individual can access the 'alert', how it is
implemented does
not present an
advantage.
Choice 4:'Alerts' can be directed only to the requestor of the 'alert'.
Choice 5: Access to the 'alert' data can be tightly controlled.
5) Which of the following statements correctly describe a Dimension table in
Dimensional
Modeling?
Choice 1: Dimension tables contain fields that describe the facts.
Choice 2: Dimension tables do not contain numeric fields.
Choice 3: Dimension tables are typically larger than fact tables.
Choice 4: Dimension tables do not need system-generated keys.
Choice 5: Dimension tables usually have fewer fields than fact tables
6) How are dimensions in a Multi-Dimensional Database related?
Choice 1: Hierarchically.
Choice 2: Through foreign keys.
Choice 3: Through a hierchy and foreign keys.
Choice 4: Through a network.
Choice 5: Through an inverse list.
7) What is a primary risk of a 'phased' implementation?
Choice 1: Previous implementations may need to be reworked.
Choice 2: The project may lose momentum.
Choice 3: Business Analysts will find problems in the data sooner.
Choice 4: Executives will lose focus.
Choice 5: The project budget may be exceeded.

2
8 ) How do highly distributed source systems impact the Data Warehouse or Data Mart
project?
Choice 1: The source data exists in multiple environments.
Choice 2: The location of the source systems has minimal impact on the Data
Warehouse or Data
Mart
implementation.
Choice 3: The timing and coordination of software development, extraction, and data
updates are
more
complex.
Choice 4: Large volumes of data must be moved between locations.
Choice 5: Additional network and data communication hardware will be needed.
9) Categories of OLAP Tools:
Level 1: Basic query and display of data
Level 2: Level 1 + advanced selection and arithmetic operations
Level 3: Level 2 + sophisticated data analysis techniques
10) Which of the following is an example of a process performed by a Level 3
OLAP tool (as

described
above)?
Choice 1: Drill down to another level of detail.
Choice 2: Display the top 10 items that meet a specific selection criteria.
Choice 3: Trend analysis.
Choice 4: Calculate a rolling average on a set of data.
Choice 5: Display a report based on specific selection criteria.
11) In a Data Mart Only architecture, what will the Data Mart Development Team(s)
encounter?
Choice 1: There is little or no minimal data redundancy across all of the Data Mart
databases.
Choice 2: Issues such as inconsistent definitions and dirty data in extracting data from
multiple
source systems
will be addressed several times.
Choice 3: Database design will be easier than expected because Data Mart databases
support
only a single
user.
Choice 4: There is ease in consolidating the Data Marts to create a Data Warehouse.
Choice 5: It is easy to develop the data extraction system due to the use of the
warehouse as a
single data
source.
12) What is the primary responsibility of the 'project sponsor' during a Data Warehouse
project?
Choice 1: To manage the day-to-day project activity.
Choice 2: To review and approve all decisions concerning the project.
Choice 3: To approve and monitor the project budget.
Choice 4: To ensure cooperation and support from all 'involved' departments.
Choice 5: To communicate project status to higher management and the board of
directors.
13) What are Metadata?
Choice 1: Data used only by the IS organization.
Choice 2: Information that describes and defines the organization's data.
Choice 3: Definitions of data elements.
Choice 4: Any business data occurring in large volumes.
Choice 5: Summarized data.
14) How can the managers of a department best understand the cost of their use of the
data
warehouse?
Choice 1: A percentage of the business department's budget should be directed to the
maintenance and
enhancement of the Data Warehouse.
Choice 2: Institute a charge-back system of computer costs for the access to the Data
Warehouse.

Choice 3: Develop a training program for department management.


Choice 4: Provide executive management with computer utilization reports that show
what
percentage of
utilization is due to the Data Warehouse.
Choice 5: Business managers should participate in the acquisition process for computer
hardware
and software.
15) Which of the following is NOT a consequence of the creation of
independent Data Marts?

3
Choice 1: Potentially different answers to a single business question if the question is
asked of
more than one
Data Mart.
Choice 2: Increase in data redundancy due to duplication of data between the Data
Marts.
Choice 3: Consistent definitions of the data in the Data Marts.
Choice 4:Creation of multiple application systems that have duplicate processing due to
the
duplication of data
between the Data Marts.
Choice 5: Increased costs of hardware as the databases in the Data Marts grow.
16) What is meant by artificial intelligence when it is applied to data cleansing and
transformation tools?
Choice 1: The tool can perform highly complex mathematical and statistical calculations
to create
derived data
elements.
Choice 2: The tool can accomplish highly complex code translations when data comes
from
multiple source
systems.
Choice 3: The tool can determine through heuristics the changes needed for a set of
dirty data and
then make
the changes.
Choice 4: The tool can perform highly complex summarizations across multiple
databases.
Choice 5: The tool can identify data that appears to be inconsistent between multiple
source
systems and
provide reporting to assist in the clean up of the source system data.
17) Which of the following classes of corporations can gain the most insights from their
legacy data?
Choice 1: A corporation that wants to determine the attitude of its customers towards the

corporation.
Choice 2: A corporation that offers new products and services.
Choice 3: A new corporation.
Choice 4: A corporation that has existed for a long time.
Choice 5: A corporation that is constantly introducing new and different products and
services.
18) Which of the following is NOT found in an Entity Relationship Model?
Choice 1: A definition for each Entity and Data Element.
Choice 2: Entity Relationship Diagram
Choice 3: Entity and Data Element Names
Choice 4: Fact and Dimension Tables
Choice 5: Business Rules associated with the entities, entity relationships, and the data
elements.
19) What is Data Mining?
Choice 1: The capability to drill down into an organization's data once a question has
been raised.
Choice 2: The setting up of queries to alert management when certain criteria are met.
Choice 3: The process of performing trend analysis on the financial data of an
organization.
Choice 4: The automated process of discovering patterns and relationships in an
organization's
data.
Choice 5: A class of tools that support the manual process of identifying patterns in
large
databases.
20) What does implementing a Data Warehouse or Data Mart help reduce?
Choice 1: The data gathering effort for data analysis.
Choice 2: Hardware costs.
Choice 3: User requests for custom reports.
Choice 4: Costs when management downsizes the organization.
Choice 5: All of the above.
21) Profitability Analysis is one of the most common applications of data warehousing.
Why is
Profitability Analysis in data warehousing more difficult than usually expected?
Choice 1: Almost every manager in an organization wants to get profitability reports.
Choice 2: Revenue data cannot be tracked accurately.
Choice 3: Expense data is often tracked at a higher level of detail than revenue data.
Choice 4: Revenue data is difficult to collect and organize.
Choice 5: Transaction grain data is required to properly compute profitability figures.
22) Which of the following would NOT be considered a recurring cost of either Data
Warehouse User
Support or Data Warehouse Administration?
Choice 1: Capacity Planning

4
Choice 2: Creation of New Data Marts
Choice 3: Security Administration

Choice 4: Data Archiving


Choice 5: Database Management System Software Selection
23) Why is it important to track all project issues and their resolution?
Choice 1: To show management what the project team has accomplished.
Choice 2: Issues will be brought back up even after they have been resolved.
Choice 3: Provides an audit trail for use in internal or external audits.
Choice 4: There is no need to track issues once they are resolved.
Choice 5: Tracking is needed for project status report.
24) When a physical database design contains summary data, what must the database
designer always
ensure?
Choice 1: Non-numeric (non-summary) data elements should not be placed in a
summary table.
Choice 2: The detail data used to create the summary data is kept in case the Data
Warehouse
database needs
to be reloaded.
Choice 3: The level of detail lost by summarization will not affect the business analysts'
use of the
data.
Choice 4: Each table with summary data has a 'from' and 'to' date.
Choice 5: The appropriate business rule(s) describing how the data will be summarized
is in place.
25) Which of the following is a business benefit of a Data Warehouse?
Choice 1: Customers are happier.
Choice 2: Reduction in Government interference.
Choice 3: Decision makers will be able to make more decisions each day.
Choice 4: Ability to identify historical trends.
Choice 5; Improves morale of the business analysts.
26) How does Ad-hoc Access differ from Managed Query Access?
Choice 1: Ad-hoc access provides users more flexibility when retrieving data.
Choice 2: Ad-hoc query access requirements are easier to anticipate.
Choice 3: Managed query access is more frequently implemented.
Choice 4: Managed query access give users more ways of getting the data they need.
Choice 5:Managed query response times are easier to optimize.
27) What is a 'snowflake' schema?
Choice 1: The dimension tables are 'normalized'.
Choice 2: The dimension tables can refer to more than one fact table.
Choice 3: All recurring groups of attributes are completely removed from dimension
tables.
Choice 4: A schema that can be implemented only with an MDDB Database
Management System.
Choice 5: Any database implemented with a network Database Management System.
28) Which of the following describes a successful decision support environment?
Choice 1: Depends heavily on sets of 'canned' queries to provide good performance
and reduced

costs
Choice 2: Has data warehouse and data mart databases that are of terabyte size
Choice 3: Costly
Choice 4: Totally independent of the operational systems
Choice 5: Iterative and evolutionary
29) What is an Operational Data Store?
Choice 1: A set of databases that serve as a 'staging' area to facilitate consolidating
data from
several,
distributed-source systems.
Choice 2: A set of databases that support OLAP.
Choice 3: A set of databases that support reporting from an application system.

5
Choice 4: A set of databases that provide integrated operations data to serve the
organization's
day-to-day
activities.
Choice 5: A set of databases to provide operational data for a single department.
30) When is it appropriate to 'denormalize' a relational database design for a Data
Warehouse database?
Choice 1: When disk space is low.
Choice 2: When memory is low.
Choice 3: When the analysis requirements are understood.
Choice 4: Any time.
Choice 5: When the database design is no longer expected to change.
31) Where in the warehouse architecture is it appropriate to calculate 'derived' data
elements for
storage?
Choice 1: As part of the business analysts' queries.
Choice 2: In an application system developed solely to address 'derived' data elements.
Choice 3: When the data are extracted from the source systems.
Choice 4: After the business analysts have extracted their data from the Data
Warehouse.
Choice 5: Just prior to loading the data into the Data Warehouse databases
32) In an architecture where 'atomic' data are maintained in the Data Warehouse and
used
to create the
Data Marts, what is the best implementation for the Data Warehouse databases?
Choice 1: Multi-Dimensional Database Management System
Choice 2: Hierarchical Database Management System
Choice 3: Relational Database Management System
Choice 4: Object Database Management System
Choice 5: Any Database Management System is acceptable.
33) Why would an organization decide to implement a Data Warehouse on a mainframe
computer with
its OLTP applications?

Choice 1: For cost considerations only.


Choice 2: For improved response time on queries to the Data Warehouse.
Choice 3: The size of the Data Warehouse has outgrown the small computer's capability
of
handling it.
Choice 4: To avoid large network requirements as a result of having to move large
amounts of
data between
platforms and database management systems.
Choice 5: The number of Data Warehouse users has increased to a point where the
smaller
platforms cannot
handle them.
34) What are the characteristics of a good candidate for a Web application?
Choice 1: One that provides data in multiple formats to a small group of business
analysts and
management.
Choice 2: Any application intended to be used by executive management.
Choice 3: One that provides data in multiple formats and that requires a low level of
processing to
a large
number of users.
Choice 4: Any application providing access to a Data Warehouse, Data Mart, or
Operational Data
Store.
Choice 5: One that requires intensive processing and provides data in a few formats to
a large
number of users.
35) What does the statement "A Data Warehouse database is non-volatile" mean?
Choice 1: Data Warehouse databases contain only historical transaction data.
Choice 2: Business requirements for a Data Warehouse are stable.
Choice 3: Data Warehouse database structures change very infrequently.
Choice 4: Data within the databases do not change from second to second.
Choice 5: Data Warehouse databases support the creation of a set of reports.
36) What is typically discovered when historical data are first extracted from legacy
systems for initial
loading into the Data Warehouse?
Choice 1: Flaws in the warehouse database design.
Choice 2: Flaws in the extraction program code.
Choice 3: The need for additional data sources.
Choice 4: Extraction run times are shorter than expected.
Choice 5: Undocumented changes in the content, usage, and structure of the historical
data.

6
37) What is typically discovered when historical data are first extracted
from legacy systems

for initial
loading into the Data Warehouse?
Choice 1: Flaws in the warehouse database design.
Choice 2: Flaws in the extraction program code.
Choice 3:The need for additional data sources.
Choice 4: Extraction run times are shorter than expected.
Choice 5: Undocumented changes in the content, usage, and structure of the historical
data.
38) What is an operational system?
Choice 1: An application system that tracks and manages the financial assets of the
organization.
Choice 2: An application system that supports the planning and forecasting within the
organization.
Choice 3: An application system that supports the creation of product(s) that the
organization
markets.
Choice 4: An application system that supports the organization's day-to-day activities.
Choice 5:An application system that supports the organization's decision-making.
39) If a Data Warehouse is to be implemented in a distributed architecture, what could
be
the most
difficult part of the implementation?
Choice 1: Finding and selecting query and reporting tools that can span multiple
databases.
Choice 2: Finding and selecting the tools to monitor database performance.
Choice 3: Convincing the business analysts that this approach will work.
Choice 4: Developing an estimated Data Warehouse workload.
Choice 5: Designing the Data Warehouse databases.
40) What is a primary risk of a 'phased' implementation?
Choice 1: Previous implementations may need to be reworked.
Choice 2: The project may lose momentum.
Choice 3: Business Analysts will find problems in the data sooner.
Choice 4: Executives will lose focus.
Choice 5: The project budget may be exceeded.
41) Data warehouses or data marts allow organizations to define 'alert' conditions -- an
alert
is raised
when something noteworthy has taken place. For implementing a facility of 'alerts', what
is
the
advantage of using a WEB interface over a client/server approach?
Choice 1: Access to the 'Alert' report is possible through a highly accessible means
already
available within the
organization.

Choice 2:The selection criteria used in determining when an 'alert' needs to be issued is
easier to
implement
using a WEB browser.
Choice 3: As long as the appropriate individual can access the 'alert', how it is
implemented does
not present an
advantage.:
Choice 4: 'Alerts' can be directed only to the requestor of the 'alert'.
Choice 5: Access to the 'alert' data can be tightly controlled
42) What should be the business analyst's involvement in monitoring the performance of
a
Data
Warehouse or Data Mart?
Choice 1: Be patient when load monitoring on the Data Warehouse or Data Mart is
taking place.
Choice 2: Become experts in SQL queries.
Choice 3: No involvement in performance monitoring.
Choice 4: Contact IT if a query takes too long or does not complete.
Choice 5: Complete all required training on the query tools they will be using.
1: Data dictionary is
A.Large collection of data mostly stored in a computer system
B.The removal of noise errors and incorrect input from a database
C.The systematic description of the syntactic structure of a specific database. It
describes the structure of the attributes the tables and foreign key relations hips.
D.None of these
Option: C

2: Data warehouse is
A.The actual discovery phase of a knowledge discovery process
B.The stage of selecting the right data for a KDD process
C.A subject-oriented integrated time variant non-volatile collection of data in
support of management
D.None of these

Answer Report Discuss

Option: C

Explanation :

3:

Data cleaning is

A.

Large collection of data mostly stored in a computer system

B.

The removal of noise errors and incorrect input from a database

C.

The systematic description of the syntactic structure of a specific database. It


describes the structure of the attributes the tables and foreign key relationships.

D.

None of these

Answer Report Discuss

Option: B

Explanation :

4:

Decision support systems (DSS) is

A.

A family of relational database management systems marketed by IBM

B.

Interactive systems that enable decision makers to use databases and models on a
computer in order to solve ill-structured problems

C.

It consists of nodes and branches starting from a single root node. Each node
represents a test, or decision

D.
None of these

Option: B

You might also like