KEMBAR78
DWH KOE - 093 Tutorial and Assignment | PDF | Data Warehouse | Data Analysis
0% found this document useful (0 votes)
62 views16 pages

DWH KOE - 093 Tutorial and Assignment

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views16 pages

DWH KOE - 093 Tutorial and Assignment

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

SESSION-2022-2023

Tutorial-1 SEM-8th KOE-093


In Pursuit of Excellence

Tutorial 1 [CO - 1 ]

Topics/Sub Topics : Fundamentals of data warehousing and data mining

Q.1 What is the difference between discrimination and classification? Between


characterization and clustering? Between classification and prediction? For each of these
pairs of tasks, how are they similar?

Q.2 Explain Data, Information and Knowledge. Write and explain the characteristics of
operational Data.?

Q.3 List and describe the five primitives for specifying a data mining task.

Q.4 What are Outliers? How outliers analysis can be done? Outliers are often discarded as
noise. However, one person's garbage could be another's treasure. For example, exceptions in
credit card transactions can help us detect the fraudulent use of credit cards. Thinking
fraudulence detection as an example, propose two methods that can be used to detect outliers
and discuss which one is more reliable. (Ext:12-13,14-15)

Q5. Recent applications pay special attention to spatiotemporal data streams. A


spatiotemporal data stream contains spatial information that changes over time, and is in
the form of stream data (i.e., the data flow in and out like possibly infinite streams).

(a) Present three application examples of spatiotemporal data streams.

(b) Discuss what kind of interesting knowledge can be mined from such data streams,
with limited time and resources.

(c) Identify and discuss the major challenges in spatiotemporal data mining.

(d) Using one application example, sketch a method to mine one kind of knowledge
from such stream data efficiently.
Name & Sign. of Faculty Sign. of Reviewer Sign. of HOD

SESSION-2022-2023

Tutorial-2 SEM-8th KOE-093


In Pursuit of Excellence

Tutorial 2 [CO -1 ]

Topics/Sub Topics: Data warehouse design and OLAP

Q1. Explain the business considerations, which are taken into account while building a data
warehouse
Q2. Briefly explain important approaches to build the data warehouse. What are the
differences between the three main types of data warehouse usage: information
processing, analytical processing and data mining? Briefly explain “ What is FASMI ”?
Explain in brief. (Ext:11-12,)
Q3. List out various processes involved in data storage in a data warehouse.
Q4. What are the important design considerations, which need to be thought of, while
designing a data warehouse?
Q5. What are the nine decisions in the design of a data warehouse?
Q6 Discuss various applications of a data warehouse.
Q7. What is one reason you might choose a relational structure over a multidimensional
structure for a data warehouse database?
Q8. Describe various schemas of multidimensional data models. Clearly contrast the
difference between a fact table and a dimension table. (Ext:11-12)
Q9. State an advantage of the multidimensional database structure over the relational
database structure for data warehousing applications
SESSION-2022-2023

Tutorial-3 SEM-8th KOE-093


In Pursuit of Excellence

Tutorial 3 [CO -2 ]

Topics/Sub Topics: Data warehouse process and schemas

Q1. Why the Data Partitioning Issue are so important in Hardware and Operating Systems
for Data Warehousing.
Q2. Discuss the different Warehouse Software tools that may require during the course of
a warehousing project.
Q3. Explain Shared – Nothing Architectures and Shared - Disk Systems with an diagram.
Q4. In reference with Metadata, discuss the following
(a) Warehouse fields used for metadata (b) Types of Metadata (c) Its
Importance
Q5. Discuss the Hardware Selection Criteria during consideration in Warehousing
environments
Q6. Mention the differences between Range Partitioning and Round-Robin Partitioning
SESSION-2022-2023

Tutorial-4
In Pursuit of Excellence SEM-8th KOE-093

Tutorial 4 [CO -2 ]

Topics/Sub Topics: Data warehouse process

Q1. In what aspect did Extraction Tools and Transformation Tools required during the
course of warehousing project.
Q2. Briefly describe the usage of Data Modeling Tools and Warehouse Management
Tools in reference of warehouse software requirement.
Q3. During warehousing, data quality is of outmost concern. Explain Why? Give
examples of Data quality tools used.
Q4. Describe the functionality of data loaders in data warehousing.
Q5. Discuss the role of scalability in distributed RDBMS Architecture.
Q6. Why we require Data Extraction, Cleanup & Transformation Tools in data
warehousing?
SESSION-2022-2023

Tutorial-5 SEM-8th KOE-093


In Pursuit of Excellence

Tutorial 5 [CO -3]

Topics/Sub Topics: Pre processing process in data mining

Q1. Give three additional commonly used statistical measures for the characterization of data
dispersion, and discuss how they can be computed efficiently in large databases.
Q2. Suppose that the data for analysis includes the attribute age. The age values for the tuples
are (in increasing order) 13, 15,15, 16, 19,20,20,21,22,22,25,25,25,25,30,33, 35, 35, 35,
35, 36, 40, 45, 46, 52, 70.
(a) What is the mean of the data? What is the median?
(b) What is the mode of the data? Comment on the data's modality (i'e., trimodal).
(c) What is the midrange of the data?
(d) Can you find (roughly) the first quartile (Q1 ) and the third quartile (Q3) of the
(e) Give the .five-number summary of the data.
(f) Show boxplot of the data.
(g) How is a quantile-quantile plot different from a quantile plot?
Q3. Using the data for age given in Q .9, answer the following.
(a) Use smoothing by bin means to smooth the data, using a bin depth of 3.
Illustrate your steps. Comment on the effect of this technique for the given data.
(b) How rnight you determine outliers in the data?
(c) What other methods are there for data smoothing?
Q4. Explain histogram. The following data are a list of prices of commonly sold items at a
company. The no have been sorted 1,1,5,5,5,8,8,10,10,15,15,15,20,20,20,20.Make a
histogram for price using singleton buckets.

Q5. Suppose a group of 12 sales price records has been sorted as follows:
5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215,
Partition them into three bins by each of the following methods.
(a) equal-frequency partitioning (b) equal-width
partitioning

Name & sign of faculty Sign of Reviewer Sign of HOD

SESSION-2022-2023

Tutorial-6 SEM-8th KOE-093


In Pursuit of Excellence

Tutorial 6[CO -3]

Topics/Sub Topics: Pre processing process in data mining

Q1. In real-world data, tuples with missing values for some attributes are a common
occurrence. Describe various methods for handling this problem.

Q. 2 Explain the following terms

i) Binning ii) Form of Data Preprocessing (Ext 14-15) iii) Noisy


data

iv) Process of Data Integration & Transformation (Ext 11-12,14-15)


v) Data cleaning (Ext 14-15) vi) Principal Component Analysis (PCA) (Ext:
12-13)

Q3. Distinguish between Data Integration and Data Transformation.

Q4. Elucidate the role of Warehouse metadata in data mining?

Q5. Describe statistical measures in large databases. What are the value ranges of the
following normalization methods? (Ext 09,12-13)

(a) min-max normalization (b) z-score normalization (c) normalization by decimal


scaling

Q.6 Use the two methods below to normalize the following group of data: 200,300,400,600,
1000

(a) min-max normalization by setting min = 0 and max = 1 (b) z-score normalization

SESSION-2022-2023

Tutorial-7 SEM-8th KOE-093


In Pursuit of Excellence

Tutorial 7 [CO -4 ]

Topics/Sub Topics: Data Mining Technique

Q 1. Discuss why analytical characterization and attribute relevance analysis are needed.
(Ext:12-13)
Q2. What is Data Generalization ? What approach we follow for it ? (Ext 08,12)

Q3 Discuss Mining Multidimensional Association Rules from Relational Databases and Data
Warehouses. (Ext 09)

Q4. Explain mining multilevel association rules from transactional databases. (Ext:12-13)

Q5 Explain the market basket analysis Why is the task mining FI set difficult. Explain the
reasons. Discuss the importance of association rules (Ext 11)

Q6. Explain the basic Algorithms for Finding Association Rules.

Q7. Explain Association Rules among Hierarchies with example.

Q8. Discuss mining single-Dimensional Boolean Association rules from Transactional


Databases.

Q9. Define each of the following data mining functionalities: characterization,


discrimination, association, classification, prediction, clustering, and evolution and
deviation analysis. Give examples of each data mining functionality, using a real-life
database that you are familiar with. (Ext: 11-12)

Q 10. What do you understand by Association rule. What are its types. Describe Apriori
Algorithm for FIM(frequent item set mining) and verify it through a suitable example.
(Ext 07,08,11)

Q11.. What is concept description ? (Ext 09)

Q12.. Define the terms data generalization and analytical characterization with example.
Given the following set of values {1,3,9,15,20}, determine the Jackknife estimate for
both the mean and standard deviation of the mean. (Ext 13-14)

Q13.. Explain data cube approach and attribute oriented approach.(Ext:2014-2015)

SESSION-2022-2023

Tutorial-8 SEM-8th KOE-093


In Pursuit of Excellence

Tutorial 8 [CO -4 ]
Topics/Sub Topics: Java DataBase Connectivity(JDBC)

Q1. Compare the advantages and disadvantages of eager classification (e.g., decision tree,
Bayesian, neural network) versus lazy classification (e.g., k-nearest neighbor, case-based
reasoning).

Q2. Briefly outline the major steps of decision tree classification.

Q3. Why is tree pruning useful in decision tree induction ?

Q4. What is a drawback of using a separate set of samples to evaluate pruning ?

Q5. Why is naive Bayesian classification called “naive"? Briefly outline the major ideas of
naive Bayesian classification.(Ext:13)

Q6. Elucidate is the difference between Classification and Prediction ?

Q7. Describe classification. Brief outline the major ideas of Basiyan Classification (Ext:11-
12)?

Q8. What is Backpropagation ? Discuss Classification by Backpropagation .(Ext:2013))

Q 9. Discuss the multilayer Feed-Forward Neural Network.(Ext:2014)

Q 10. Discuss the importance of crossover in Genetic algorithm.

SESSION-2022-2023
SEM-8th KOE-093
Tutorial-9
In Pursuit of Excellence

Tutorial 9 [CO -5]

Topics/Sub Topics: OLAP and different kind of data mining

Q1. Explain the following terms (Ext 08,09,11-12,13-14,14-15)

i) ROLAP ii) MOLAP iii) HOLAP

Q2. Why OLAP is required in data warehouse?

Q3 Discuss the deference between Web Mining versus Data Mining.

Q4. Write a Short Note on Following

a) Web Mining

b) Spatial Mining

c) Temporal Mining
SESSION-2022-2023

Tutorial-10 SEM-8th KOE-093


In Pursuit of Excellence

Tutorial 10 [CO -5 ]

Topics/Sub Topics: Data warehouse backup, security and testing strategy

Q1. Elucidate the importance of testing the data warehouses.

Q2. List out five reasons why you think data quality is critical in a Data Warehouse.

Q3. Discuss some of the specific applications of data warehouses.

Q4. Give reasons why the data warehouse must be back up. How is this different from an
OLTP system?

Q5. In accordance with Data Warehousing, discuss the Security concern . (Ext 08,09,11,12-
13,13-14)

Q6. In reference with data warehousing, discuss the types of Backup and recovery strategies.
SESSION-2022-2023
ASSIGNMENT - 1
SEM-8th KOE-093
In Pursuit of Excellence

Home Assignments
Unit 1[CO-1 ]

Q1. What is a Data Warehouse? What are its needs? What are its components?

Q2. Explain the following terms(Ext 08,12-13,13-14,)

i) Data Cubes ii) Data Marts iii) Metadata

Q3. Give the differences between (Ext 08,09)

i) Database System vs Data Warehouse ii) OLTP vs Data Warehouse

Q4. Give the 3 -Tier Architecture of Data Warehouse .Explain ETL Process (Ext 08,13-
14,16)

Q5. Explain Multi-Dimensional Data Model of a Data Warehouse and Aggregation with an
example.
SESSION-2022-2023
ASSIGNMENT - 2
SEM-8th KOE-093
In Pursuit of Excellence

Unit 2 [CO-2 ]
1. What are the warehouse strategy components? Explain it with an example.

2. Discuss the role of Major Parameters to be considered during Data Warehousing


Strategy.

3. Warehouse Management and Support Processes are designed to address aspects of


planning and managing a data warehouse project. Why they are critical to the
successful implementation and subsequent extension of the data warehouse.

4. The data warehouse planning approach describes the activities related to planning one
rollout of the data warehouse. Give a brief about Parameters that are to be keep in
mind during Data warehousing Planning

5. An implementation project should be scoped to last between three to six months. Give
a scope of Some Parameters to be considered in Data warehouse Implementation
which support during an implementation of a project.
SESSION-2022-2023
ASSIGNMENT - 3
SEM-8th KOE-093
In Pursuit of Excellence

Home Assignments
Unit 3[CO-3]

Q.1 Explain Concept hierarchy generation for categorical data. Describe why concept hierarchies are
useful in data mining. (Ext: 2008, 09, 10, 12, 13, 14)

Q2. Distinguish between Dimensionality reduction and Numerosity reduction.(Ext:08,09,12-13)

Q3. What is the role of statistics in data mining? Describe statistical measures in large
databases. (Ext:12-13) What are the properties of standard deviation and give its
formula? Write short notes on :
(Ext: 14-15)
(a) Quartiles (b) Histograms (c) Scatter plots

Q4. Give an example for concept hierarchy generation for categorical data such as attribute “Hobby”
SESSION-2022-2023
ASSIGNMENT - 4
SEM-8th KOE-093
In Pursuit of Excellence

Home Assignments
Unit 4[CO-4]

Q1. Explain the types of Data that often occur in cluster analysis and briefly explain how to
preprocess that data for clustering. Explain Cluster analysis. (Ext:11-12,14-15)

Q2. Correctly contrast the difference between supervised and unsupervised learning.
Q3. Discuss in brief, where Clustering and Nearest-Neighbor Prediction are used?
Q4. How is the space for clustering and nearest neighbor defined? Explain.
Q5.What is the difference between clustering nearest-neighbor prediction?
Q6. Suppose that the values for a given set of data are grouped into intervals. The intervals
and corresponding frequencies are as follows.

Age Frequency

1-5 200

5 -15 450

15 – 20 300

20 – 50 1500

50 – 80 700

80 - 44
110

Compute an approximate median value for the data.


Q.7 Give three additional commonly used statistical measures for the characterization of data
dispersion, and discuss how they can be computed efficiently in large databases.

SESSION-2022-2023
ASSIGNMENT - 5
SEM-8th KOE-093
In Pursuit of Excellence

Home Assignments
Unit 5[CO-5]

Q1. What do you understand by OLAP? Explain its advantage and disadvantage over
previous available technologies. (Ext 08,09)

Q2. Difference between OLTP and OLAP. (Ext 09,12-13,13-14)

Q3. Give E.F. Codd’s 12 guideline for OLAP. Discuss OLAP servers. (Ext 08,09,13-14)

You might also like