KEMBAR78
Data Warehouse Interview Questions | PDF | Data Warehouse | Cluster Analysis
0% found this document useful (0 votes)
17 views5 pages

Data Warehouse Interview Questions

The document provides a comprehensive list of interview questions and answers related to Data Warehousing, covering topics such as aggregate tables, metadata, ER diagrams, and various schemas. It also discusses concepts like ETL, OLAP functions, and different types of slowly changing dimensions (SCD). This resource serves as a guide for individuals preparing for data warehousing roles.

Uploaded by

alwin343413273
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views5 pages

Data Warehouse Interview Questions

The document provides a comprehensive list of interview questions and answers related to Data Warehousing, covering topics such as aggregate tables, metadata, ER diagrams, and various schemas. It also discusses concepts like ETL, OLAP functions, and different types of slowly changing dimensions (SCD). This resource serves as a guide for individuals preparing for data warehousing roles.

Uploaded by

alwin343413273
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

科技求职更简单

Data Warehouse Interview Questions

1. What is an aggregate table in a Data warehouse?


An aggregate table is a table that contains existing warehouse data grouped to a
certain level of dimensions. It is much easier to retrieve data from an aggregated
table than the original table, which has more records.

2. What do you understand by metadata in Data warehouse?


Data about the data is called metadata. The metadata includes fixed width and
limited width, number of columns used, data types, and fields' ordering.

3. Define ER diagram in data warehousing.


ER (Entity-Relationship) diagram is a diagram illustrating interrelationships
between different entities in a database. The diagram shows the structure of all
tables and links between them.

4. Name the approaches used by the optimizer during the execution plan.
The two approaches used by Optimizer during the execution plan are:
Rule-based: It is an old technique of carrying out a query based on certain
specific rules
Cost-based: Focuses on finding the most efficient way of carrying out a query.
This requires up to date statistical information of the data

5. What do you understand by Star Schema?


Star Schema is the management of the table so that results can be recovered
readily in the data warehouse environment.

6. What is the difference between agglomerative clustering and divisive


hierarchical clustering?
In the agglomerative hierarchical clustering methods, clusters are read from
bottom to top. In this method, each object builds its cluster, and these clusters
make a large cluster. There is continuous merging until a single large cluster is
created. At the same time, divisive hierarchical clustering uses a top to bottom
approach. In this method, the division of clusters occurs. The division of parent
clusters continues until each cluster has a single object.

7. What are the testing phases in a project?


There are five stages of an ETL test- identification of requirements and data
sources, acquisition of data, implementation of business logic, building and
publishing of data, and reporting.

www.zhitongguigu.com
科技求职更简单

8. What do you understand by data mart?


Data mart includes the subset of organization-wide data. This subset of data is
insightful to specific groups in an organization. In simple words, we can say that
the data mart contains group-specific data.
Become a Data Scientist With Real-World Experience
Data Scientist Master’s ProgramExplore Program

9. Give reasons for partitioning.


Ans. Partitioning is done for many reasons, such as assisting backup recovery,
enhancing performance, and easy management.

10. What are the functions of a warehouse manager?


A warehouse manager is responsible for performing referential integrity and
consistency checks to create business views, indexes, and partition views against
the base data. The warehouse manager merges and transforms the source data
into the temporary store, backs up the data into the data warehouse, and
archives the data at the ends of the captured life.

11. Explain virtual data warehousing.


Virtual data warehousing is an information system strategy that supports
analytical decision-making. A virtual data warehouse gives a collective view of
completed data. It has no historical data and can be considered as a logical data
model. It allows the end-user to view as virtualized with a semantic map.

12. What do you understand by Hybrid SCD?


A combination of both SCD1 and SCD2 is called Hybrid SCD. For tables in which
some columns (some type 1 and some type 2) are essential, and we need to track
its changes, i.e., capture their historical data, we implement Hybrid SCDs.

13. Define snapshot concerning data warehousing.


A snapshot refers to complete data visualization at extraction time. It is used to
back up and restore data, and it occupies less space. It is a process of knowing
the performed activities. It is stored in a report format which is generated soon
after the catalog is disconnected.

14. What are some of the functions performed by OLAP?


The primary functions performed by OLAP are:
Roll up
Slice
Dice
Drill-down
Pivot

www.zhitongguigu.com
科技求职更简单

15. What do you understand by ODS?


ODS(Operational Data Store) is a database designed to integrate data from
multiple sources for additional operations on data. It is a repository of real-time
operational data which is not sent back to Operating Systems. It can be passed
for reporting to Data Warehouse.

16. Define summary information.


Summary information is defined as a location within a data warehouse that
stores predefined aggregations.
In the next section, we will cover some of the high level and advanced data
warehouse interview questions.

17. What is meant by dimensional modelling?


Modelling is an essential concept of data warehousing. It is the conceptual
structure of data into storage. It focuses on relationships between data objects
and the rules of the data storage software. The data warehouse also allows
multidimensional modelling.

18. Name three primary functions of dimensions.


The primary functions of dimensions are:
filtering: choosing a smaller part of our data set for analysis. It is generally
temporary.
labeling: tagging group of samples is called labelling. It makes data query-able.
grouping: classifying data into groups or small buckets.

19. Explain Galaxy Schema.


Fast Constellation Schema or Galaxy Schema consists of dimensional tables along
with two fact tables. It can also be called a combination of stars.

20. Define three types of SCD.


The three types of SCD(slowly changing dimension) are:
SCD 1- overwrites current record with a new record
SCD 2- To an existing customer dimension table, it creates another dimension
record
SCD 3- creates a current value field for including new data.

21. What is dice operation, and how many dimensions are there?
A dice operation is a grouping process in which data is grouped based on a
particular category. Within this process, two or more dimensions are used.

22. Give the main benefit of normalization.


The use of the normalization process helps in reducing data redundancy. It helps
in maintaining the validity of data which makes more sense to the user whenever
needed.

www.zhitongguigu.com
科技求职更简单

23. What is the Query manager responsible for?


The query manager manages all the user queries and operations. The complexity
of the query manager is defined and evaluated on the basis of end-user access.
The query manager manages all the user queries generated within the
environment to extract the data.

24. Explain Junk Dimension.


A single dimension used to store a small dimension named junk attributes is
called a junk dimension. The junk attributes are a group of text attributes and
flags that transmit them into a separate sub-dimension known as junk dimension.

25. What is meant by VLDB?


VLDB or a Very large database consists of a database of one terabyte. The
database requires storage space with the most extensive file and a large number
of database rows. This database uses decision support applications and training
process applications for a large number of users.

26. What is meant by Snowflake Schema?


A primary dimension table interlinked with other tables is called a snowflake
schema. It can be joined with a fact table.

27. What is the difference between materialized view and view?


A materialized view is a physical copy, picture, or snapshot of the base table. It
gives indirect access to the table data by saving query results during a separated
schema. A view is a virtual table that can be used in place of tables and takes the
query's output.

28. What do you understand by active data warehousing?


Active Data warehousing is a data warehouse with the ability to record
transactions when they change and combine into the warehouse along with
scheduled or batch cycle refreshers.

29. Explain ETL.


ETL (Extract, Transform and Load) is a software tool that extracts data from
different sources, transforms data by applying concatenation, calculations, etc.,
and loads it into the Data Warehouse system. This process requires active inputs
from developers, testers, analysts, stakeholders, etc.

30. What are non-additive facts?


Non-addictive facts are facts that are not summed up for any dimensions present
in the fact tables. The same facts can be helpful if there are changes in the
dimensions.

www.zhitongguigu.com
科技求职更简单

31. Define conformed dimensions.


Conformed dimensions are dimensions usable across multiple data marts in
combination with multiple fact tables. It is a dimension that has the same content
and meaning being referred from different fact tables. It refers to multiple tables
in multiple data marts.

32. What do you understand by fact table?


A fact table contains information about facts, measurements, and metrics of a
business process. It is located in the center of the star schema generally. A fact
table consists of two types of columns- one has the fact data, and the second has
the foreign key relation—Star or snowflake schema stores only one fact table.

www.zhitongguigu.com

You might also like