KEMBAR78
Sample Question DMW | PDF | Cluster Analysis | Statistical Classification
0% found this document useful (0 votes)
98 views4 pages

Sample Question DMW

Dmw questions

Uploaded by

Akshay Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views4 pages

Sample Question DMW

Dmw questions

Uploaded by

Akshay Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Sample Paper

Data Mining and Warehousing (20CSF-334)


2 Marks Questions
1) Define Data Mining
2) List out KDD process steps
3) What are the types of data?
4) Compare descriptive and predictive data mining
5) What is classification
6) What is prediction
7) Why we need to Pre-process the data
8) List out Data Pre-processing steps
9) What is Data cleaning
10) what is Data integration
11) Illustrate Data transformation functions
12) List out the major issues in data mining
13) What is Data selection
14) Define Data warehouse
15) Define Outlier Analysis
16) Define Clustering analysis
17) Define evolution Analysis
18) What is data redundancy
19) Define Data discretization
20) What is categorical attribute
21) List the key words used in the definition of Data Warehouse.
22) Compare the size of Database in OLTP and OLAP
23) Define metadata
24) List out the OLAP Operations
25) What is meant by association rule?
26) What is meant by Market basket analysis?
27) state and explain Apriori property.
28) What is meant by Mining Multilevel Association Rules?
29) Define Uniform Minimum Support.
30) What is meant by Reduced Minimum Support?
31) What is meant by multidimensional association rules?
32) What is meant by intradimensional association rule?
33) What is meant by inter dimensional association rules?
34) What is meant by Quantitative association rules?
35) What is meant by Partition Algorithms?
36) state and explain FP_growth Algorithm.
37) What is meant by Frequent itemset.
38) What is meant by Maximal Frequent Item Set?
39) What is meant by Closed Frequent Item Set?
40) Expalin the join & prune step in apriori algorithm.
41) Draw and explain the conditional FP_Tree.
42) How will you measure support and confidence with an example?
43) How to improve the efficiency of apriori algorithm.
44) What is meant by conditional pattern base?
45) Where are decision trees mainly used?
46) What do you meant by concept hierarchies?
47) How will you solve a classification problem using decision trees?
48) Explain ID3.
49) What is a “decision tree”?
50) Define Data Classification.
51) Define Prediction.
52) What is the difference between “supervised” and unsupervised” learning
scheme.
53) What are the requirements of clustering?
54) State the categories of clustering methods?
55) What do you meant by Bayesian Classification.
56) State and explain Bayes Theorem..
57) Difference between K-Means and K-Medoids Algorithms.
58) What do you meant by Hierarchical Clustering
59) What do you meant by Agglomerative Clustering.
60) What do you meant by Outlier Detection.
61) What do you meant by divisive Clustering.
62) What is Bayesian Belief Networks.
63) Why is naıve Bayesian classification called “naıve”? Briefly outline the major
ideas of naıve Bayesian classification.

5 Marks Questions
1) Identify the need for Data Mining
2) Show with diagrammatic illustration of the steps involved in the process of the
Knowledge Discovery from Data
3) Classify the different types of data on which Mining can be performed
4) Illustrate the architecture of a typical Data mining system
5) Explain Various Data Mining Functionalities with an example
6) Illustrate with a diagram about Data Mining Task Primitives.
7) Discuss about the Major issues in Data Mining.
8) What is Data Cleaning? Describe various methods of Data Cleaning.
9) List the Issues to be considered during Data Integration
10) Explain about Various kinds of Association rule Mining.
11) Explain in detail about partitional algorithms with an example.
12) Explain the steps involved in Apriori Algorithm.
13) Explain in detail about Multidimensional association rule.
14) Explain the Naive Bayesian Classification algorithm.
15) Write short notes on Bayesian Belief Networks?
16) Discuss about k-nearest neighbor classification algorithm with an example
17) Explain in detail about Hierarchical Clustering.
18) Explain in detail about partitional Clustering method.
19) Discuss about Outlier Detection.
20) Explain in detail about Clustering methods with an example.
21) Given a decision tree, you have the option of (a) converting the decision tree to
rules and then pruning the resulting rules, or (b) pruning the decision tree and
then converting the pruned tree to rules? What advantage does (a) have over
(b)?
22) Why is tree pruning useful in decision tree induction? What is a drawback of
using a separate set of tuples to evaluate pruning?
23) Compare the advantages and disadvantages of eager classification (e.g., decision
tree, Bayesian, neural network) versus lazy classification (e.g., k-nearest
neighbor, case-based reasoning).
24) Briefly describe and give examples of each of the following approaches to
clustering: partitioning methods, hierarchical methods, density-based methods
and grid-based methods.
25) Present conditions under which density-based clustering is more suitable than
partitioning-based clustering and hierarchical clustering. Give application
examples to support your argument.

10 Marks Questions
1) Suppose that the data for analysis includes the attribute age. The age values for
the data tuples are (in increasing order) :
13, 15, 16, 16, 19, 20, 23, 29, 35, 41, 44, 53, 62, 69, 72
(i) Use min-max normalization to transform the value of 45 for age onto the
range [0,1]
(ii) Use Z-Score normalization to transform the value 45 for age where the
standard deviation of age is 20.64 years
2) Discuss about detecting data redundancy using correlation analysis
3) Explain about Data Transformation method with suitable example
4) Explain about the different Data Reduction techniques.
5) Discuss about FP-growth algorithm for the example{M,O,N,K,E,Y}
{D,O,N,K,E,Y} {M,A,K,E} {M,U,C,K,Y} {C,O,O,K,I,E}, Support= 60 %,
Confidence = 80 %.
6) State and explain Apriori Algorithm with an example Consider the following
data set to generate Association rules {D,O,N,K,E,Y} {M,A,K,E}
{M,U,C,K,Y} {C,O,O,K,I,E}, Support= 60 %, Confidence = 80 %.
7) Explain in detail about support and Confidence Measures with an example
8) Discuss about Quantitative association mining.
9) Discuss about Decision tree induction algorithm with an example.
10) Explain about Attribute Subset Selection Measures with an example.
11) Explain clustering in detail with types of clustering algorithms.
12) Use single and complete link agglomerative clustering to group the elements of
the following dataset: {8,11,21,29,40}
13) To make a drink, salt and sugar are mixed in a glass of water in some ratio. With
these two attributes drink is classified in two classes i.e. good and bad. The
dataset for this scenario is given below:

Drink Id Salt Sweet Result


1 7 7 Bad
2 7 4 Bad
3 3 4 Good
4 1 4 Good
By using KNN classifier find that if the ratio of salt and sweet is 3 and 7
respectively then in which class that drink will lie. Take the value of K=3.

14) Consider the following transactional dataset:


Find all frequent 2-itemsets with min support count=2. Generate candidate frequent 3-
itemsets using C3=F2 x F1 candidate generation method. Prune candidates which
cannot be frequent

15) Consider the following 9 two-dimensional data points:

x1(0,0), x2(1,0), x3(1,1), x4(2,2), x5(3,1), x6(3,0), x7(0,1), x8(3,2), x9(6,3)

Use the Euclidean Distance with Eps =1 and MinPts = 3. Find all core points,
border points and noise points, and show the final clusters using DBCSAN
algorithm. Let’s show the result step by step.

16) Consider the following points:

Apply K-means starting from the centroids: K1=P7 and K2 = P4

You might also like