Code No: M0502                                                 Set No.
1
      IV B.Tech I Semester Regular Examinations, November 2009
             DATA WAREHOUSING AND DATA MINING
                   (Computer Science & Engineering)
Time: 3 hours                                         Max Marks: 80
                      Answer any FIVE Questions
                    All Questions carry equal marks
                                ?????
  1. (a) Explain data mining as a step in the process of knowledge discovery.
     (b) Differentiate operational database systems and data warehousing.           [8+8]
  2. (a) Briefly discuss the forms of Data preprocessing with neat diagram.
     (b) Briefly discuss the parametric and non- parametric methods of Numerosity
         reduction.                                                        [8+8]
  3. Explain the syntax for the following data mining primitives:
      (a) Task-relevant data
     (b) The kind of knowledge to be mined
      (c) Interestingness measures
     (d) Presentation and visualization of discovered patterns.                       [16]
  4. (a) How can we perform attribute relevant analysis for concept description? Ex-
         plain.
     (b) Briefly explain about the presentation of class comparison descriptions. [8+8]
  5. Compare and contrast the differences between mining single dimensional Boolean
     Association rules and multilevel Association rules for transactional databases. [16]
  6. (a) Why naive Bayesian classification called “naive”? Briefly outline the major
         ideas of naive Bayesian classification.
     (b) Define regression. Briefly explain about linear, non-linear and multiple regres-
         sions.                                                                    [8+8]
  7. (a) Use a diagram to illustrate how, for a constant MinPts value, density-based
         clusters with respect to a higher density (i.e., a lower value for ε , the neigh-
         borhood radius) are completely contained in density- connected sets obtained
         with respect to a lower density.
     (b) Give an example of how specific clustering methods may be integrated, for
         example, where one clustering algorithm is used as a preprocessing step for
         another.                                                              [8+8]
  8. (a) Explain similarity search in multimedia data.
     (b) Explain similarity search in time-series analysis.
                                         1 of 2
Code No: M0502                                         Set No. 1
    (c) What is meant by authoritative web pages? Explain about mining the webs
        link structures to identify authoritative web pages.            [5+6+5]
                                  ?????
                                   2 of 2
Code No: M0502                                                Set No. 2
      IV B.Tech I Semester Regular Examinations, November 2009
             DATA WAREHOUSING AND DATA MINING
                   (Computer Science & Engineering)
Time: 3 hours                                         Max Marks: 80
                      Answer any FIVE Questions
                    All Questions carry equal marks
                                ?????
  1. (a) Explain the efficient computation of data cubes.
     (b) Discuss the efficient processing of OLAP queries.                         [8+8]
  2. Briefly discuss the Discretization and concept hierarchy techniques.           [16]
  3. Explain the syntax for the following data mining primitives:
     (a) Task-relevant data
     (b) The kind of knowledge to be mined
      (c) Interestingness measures
     (d) Presentation and visualization of discovered patterns.                      [16]
  4. (a) Differentiate attribute generalization threshold control and generalized rela-
         tion threshold control.
     (b) Differentiate between predictive and descriptive data mining.             [8+8]
  5. Propose a method for mining hybrid-dimension association rules (multidimensional
     association rules with repeating predicates)and explain with an example.    [16]
  6. (a) Why naive Bayesian classification called “naive”? Briefly outline the major
         ideas of naive Bayesian classification.
     (b) Define regression. Briefly explain about linear, non-linear and multiple regres-
         sions.                                                                    [8+8]
  7. The following table contains the attributes name, gender, trait-1, trait-2, trait-3,
     and trait-4, where name is an object-id, gender is a symmetric attribute, and the
     remaining trait attributes are asymmetric, describing personal traits of individuals
     who desire a penpal. Suppose that a service exists that attempt to find pairs of
     compatible penpals.
       Name    gender trair-1 trait-2 trait-3 trait-4
       Kevan     M      N        P       P      N
      Caroline   F      N        P       P      N
       Erilk     M       P      N       N        P
         .        .      .       .       .       .
         .        .      .       .       .       .
         .        .      .       .       .       .
                                         1 of 2
Code No: M0502                                               Set No. 2
    For asymmetric attribute values, let the value P be set to 1 and the value N be set
    to 0. Suppose that the distance between objects (potential penpals) is computed
    based only on the asymmetric variables.
    (a) Show the contingency matrix for each pair given Kevan, Caroline, and Erik.
    (b) Compute the simple matching coefficient for each pair.
     (c) Compute the Jaccard coefficient for each pair.
    (d) Who do you suggest would make the best pair of penpals? Which pair of
        individuals would be the least compatible.                [4+4+4+4]
 8. (a) What is multimedia database? Explain mining multimedia databases.
    (b) What is a time-series database? What is a sequence database? Explain mining
        time-series and sequence data.                                        [8+8]
                                      ?????
                                       2 of 2
Code No: M0502                                                Set No. 3
      IV B.Tech I Semester Regular Examinations, November 2009
             DATA WAREHOUSING AND DATA MINING
                   (Computer Science & Engineering)
Time: 3 hours                                         Max Marks: 80
                      Answer any FIVE Questions
                    All Questions carry equal marks
                                ?????
  1. (a) Explain data mining as a step in the process of knowledge discovery.
     (b) Differentiate operational database systems and data warehousing.          [8+8]
  2. Explain various data reduction techniques.                                     [16]
  3. The four major types of concept hierarchies are: schema hierarchies, set-grouping
     hierarchies, operation-derived hierarchies, and rule-based hierarchies.
     (a) Briefly define each type of hierarchy.
     (b) For each hierarchy type, provide an example.                                [16]
  4. (a) Differentiate attribute generalization threshold control and generalized rela-
         tion threshold control.
     (b) Differentiate between predictive and descriptive data mining.             [8+8]
  5. (a) Explain about constraint-based Association mining.
     (b) Give an example for Association rule mining? Classify Association rules.[8+8]
  6. (a) Given a decision tree, you have the option of (i) converting the decision tree
         to rules and then pruning the resulting rules, or (ii) pruning the decision tree
         and then converting the pruned tree to rules. What advantages does former
         option have over later one. Explain.
     (b) Can any ideas from association rule mining be applied to classification? Ex-
         plain.                                                                 [8+8]
  7. Explain the following:                                                  [4+4+4+4]
     (a) DBSCAN
     (b) OPTICS
      (c) DENCLUE
     (d) BIRCH.
  8. (a) What is spatial data warehouse? What are the different types of dimensions
         in a spatial data cube? What are the different types of measures in a spatial
         data cube?
     (b) What is keyboard-based association analysis? How can automated document
         classification be performed?
                                         1 of 2
Code No: M0502                                             Set No. 3
    (c) Briefly discuss about mining the World Wide Web.      [2+2+2+2+2+6]
                                   ?????
                                    2 of 2
Code No: M0502                                                 Set No. 4
      IV B.Tech I Semester Regular Examinations, November 2009
             DATA WAREHOUSING AND DATA MINING
                   (Computer Science & Engineering)
Time: 3 hours                                         Max Marks: 80
                      Answer any FIVE Questions
                    All Questions carry equal marks
                                ?????
  1. (a) Describe three challenges to data mining regarding data mining methodology
         and user interaction issues.
     (b) Explain Indexing OLAP data.                                              [8+8]
  2. Explain various data reduction techniques.                                    [16]
  3. (a) Discuss the various forms of visualizing the discovered patterns.
     (b) Discuss about the task-relevant data specification.                      [8+8]
  4. Suppose that the data for analysis include the attribute age. The age values for
     the data tuples are (in increasing order):
     13,15,16,16,19,20,20,21,22,22,25,25,25,25,30,33,33,35,35,35,35,36,40,45,46,52,70.
     (a) What is the mean of the data?
     (b) What is the median?
      (c) What is the mode of the data? Comment on the data’s modality.
     (d) What is the mid range of the data?
      (e) Can you find (roughly) the first quartile(Q1),and third quartile(Q3) of the
          data?
      (f) Give the five number summaries of the data.
     (g) Show a box plot of the data.
     (h) How is the quantile-quantile plot different from a quantile plot?          [16]
  5. Sequential patterns can be mined in methods similar to the mining of association
     rules. Design an efficient algorithm to mine multilevel sequential patterns from
     a transaction database. An example of such a pattern is the following “A customer
     who buys a PC will buy Microsoft software within three months”, on which one
     may drill down to find a more refined version of the patterns, such as “A customer
     who buys a Pentium PC will buy Microsoft office within three months”.          [16]
  6. (a) What is classification? What is prediction? Describe issues regarding classifi-
         cation and prediction.
     (b) Explain Bayesian belief networks. How does a Bayesian belief network train?
                                                                               [8+8]
  7. (a) Write algorithms for k-Means and k-Medoids. Explain.
                                        1 of 2
Code No: M0502                                              Set No. 4
    (b) Discuss about density-based methods.                                     [8+8]
 8. (a) Explain the classification and prediction analysis of multimedia data.
    (b) What are basic measures for text retrieval? What methods are there for
        information retrieval?
    (c) What is meant by ‘authoritative’ Web pages? Explain about mining the Web’s
        link structures to identify authoritative web page.               [4+6+6]
                                      ?????
                                       2 of 2