0% found this document useful (0 votes)

53 views37 pages

Weka DW&DM Lab Notes

DWDM notes

Uploaded by

tempofnc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views37 pages

Weka DW&DM Lab Notes

DWDM notes

Uploaded by

tempofnc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

DATAWARE HOUSE TOOLS

ParAccel (Actian)
Cloudera

Talend

Query surge

Amazon Redshift

Teradata

Oracle

TabLeau

Page 1.1
Open Source Data Mining Tools

WEKA

Orange

KNIME

R-Programming

Rapid Miner
Apache Mahout

Tanagra

XL Miner

Page 1.2
Experiment 1: Installation of WEKA Tool
Aim: A. Investigation the Application interfaces of the Weka tool. Introduction:

Introduction
Weka (pronounced to rhyme with Mecca) is a workbench that contains a collection of
visualization tools and algorithms for data analysis and predictive modeling, together with
graphical user interfaces for easy access to these functions. The original non-Java version of
Weka was a Tcl/Tk front-end to (mostly third-party) modeling algorithms implemented in other
programming languages, plus data preprocessing utilities in C, and Make file-based system for
running machine learning experiments. This original version was primarily designed as a tool for
analyzing data from agricultural domains, but the more recent fully Java-based version (Weka 3),
for which development started in 1997, is now used in many different application areas, in
particular for educational purposes and research. Advantages of Weka include:

 Free availability under the GNU General Public License.

 Portability, since it is fully implemented in the Java programming language and thus runs
on almost any modern computing platform
 A comprehensive collection of data preprocessing and modeling techniques
 Ease of use due to its graphical user interfaces

Description:
Open the program. Once the program has been loaded on the user‟s machine it is opened by
navigating to the programs start option and that will depend on the user‟s operating system.
Figure 1.1 is an example of the initial opening screen on a computer.
There are four options available on this initial screen:

Information Technology Page 1

Fig: 1.1 Weka GUI

1. Explorer - the graphical interface used to conduct experimentation on raw data After clicking
the Explorer button the weka explorer interface appears.

Fig: 1.2 Pre-processor

Information Technology Page 2

Information Technology Page 3
Inside the weka explorer window there are six tabs:
1. Preprocess- used to choose the data file to be used by the application.
Open File- allows for the user to select files residing on the local machine or recorded medium
Open URL- provides a mechanism to locate a file or data source from a different location
specified by the user
Open Database- allows the user to retrieve files or data from a database source provided by user
2. Classify- used to test and train different learning schemes on the preprocessed data file under
experimentation

Fig: 1.3 choosing Zero set from classify

Again there are several options to be selected inside of the classify tab. Test option gives the user
the choice of using four different test mode scenarios on the data set.
1. Use training set
2. Supplied training set
3. Cross validation
4. Split percentage

3. Cluster- used to apply different tools that identify clusters within the data file.
The Cluster tab opens the process that is used to identify commonalties or clusters of occurrences
within the data set and produce information for the user to analyze.

Information Technology Page 4

4. Association- used to apply different rules to the data file that identify association within the
data. The associate tab opens a window to select the options for associations within the dataset.

Information Technology Page 5

5. Select attributes-used to apply different rules to reveal changes based on selected attributes
inclusion or exclusion from the experiment

6. Visualize- used to see what the various manipulation produced on the data set in a 2D format,
in scatter plot and bar graph output.

2. Experimenter - this option allows users to conduct different experimental variations on data
sets and perform statistical manipulation. The Weka Experiment Environment enables the user to
create, run, modify, and analyze experiments in a more convenient manner than is possible when
processing the schemes individually. For example, the user can create an experiment that runs
several schemes against a series of datasets and then analyze the results to determine if one of the
schemes is (statistically) better than the other schemes.

Fig: 1.6 Weka experiment

Results destination: ARFF file, CSV file, JDBC database.

Experiment type: Cross-validation (default), Train/Test Percentage Split (data randomized).
Iteration control: Number of repetitions, Data sets first/Algorithms first.
Algorithms: filters

Information Technology Page 6

3. Knowledge Flow -basically the same functionality as Explorer with drag and drop
functionality. The advantage of this option is that it supports incremental learning from previous
results
4. Simple CLI - provides users without a graphic interface option the ability to execute
commands from a terminal window.
b. Explore the default datasets in weka tool.

Click the “Open file…” button to open a data set and double click on the “data” directory.
Weka provides a number of small common machine learning datasets that you can use to practiceon.
Select the “iris.arff” file to load the Iris dataset.

Fig: 1.7 Different Data Sets in weka

References:
[1] Witten, I.H. and Frank, E. (2005) Data Mining: Practical machine learning tools and
techniques. 2nd edition Morgan Kaufmann, San Francisco.
[2] Ross Quinlan (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers,
San Mateo, CA.
[3] CVS–http://weka.sourceforge.net/wiki/index.php/CVS
[4] Weka Doc–http://weka.sourceforge.net/wekadoc/

Exercise:
1. Normalize the data using min-max normalization

Information Technology Page 7

Weka
1. Waikato Environment for Knowledge Analysis
2. OPEN SOURCE DATA MINING TOOLS

3. Weka logo, a bird of New Zealand

4. Weka contains a collection of visualization tools and
algorithms for data analysis and predictive modeling

5. Operating system Windows, macOS, Linux

6. Latest Version 3.9.6(Jan 2022)

7. https://www.filehorse.com/download-weka/

8. Open Source Data Mining Tools-Weka,orange, KNIME,

R- Programming, XL-Miner
1. Which of the following is true about Weka?

a. Weka is a data visualization tool.

b. Weka is a programming language.
c. Weka is a collection of machine learning algorithms.
d. Weka is used only for unsupervised learning.

Answer: c. Weka is a collection of machine learning algorithms.

Explanation: Weka stands for Waikato Environment for Knowledge

Analysis and is a collection of machine learning algorithms and data
preprocessing tools.

2. Which file format is commonly used to import data into Weka?

a. PDF
b. CSV
c. MP4
d. PNG

Answer: b. CSV.

Explanation: Weka can import data in various file formats, but CSV
(Comma Separated Values) is the most commonly used file format for
importing data.

3. Which of the following is NOT a type of data preprocessing

available in Weka?

a. Attribute selection
b. Data cleaning
c. Data normalization
d. Data visualization
Answer: d. Data visualization.

Explanation: Weka does not have built-in data visualization tools, but it
does provide various data preprocessing techniques such as attribute
selection, data cleaning, and data normalization.

4. Which algorithm is used for classification in Weka?

a. Naive Bayes
b. K-means
c. Random Forest
d. PCA

Answer: a. Naive Bayes.

Explanation: Weka provides a variety of classification algorithms,

including Naive Bayes, Decision Trees, Support Vector Machines, and
more.

5. Which of the following is NOT a clustering algorithm in Weka?

a. K-means
b. DBSCAN
c. EM
d. Linear Regression

Answer: d. Linear Regression.

Explanation: Linear Regression is not a clustering algorithm, but a

supervised learning algorithm used for regression tasks.

6. Which of the following is NOT a type of evaluation metric in

Weka?
a. Accuracy
b. Precision
c. Recall
d. Distance

Answer: d. Distance.

Explanation: Distance is not an evaluation metric in Weka, but a concept

used in clustering algorithms to measure the similarity or dissimilarity
between data points.

7. Which of the following is a feature selection technique in Weka?

a. Principal Component Analysis (PCA)

b. Recursive Feature Elimination (RFE)
c. K-means clustering
d. K-nearest neighbors (KNN)

Answer: b. Recursive Feature Elimination (RFE).

Explanation: RFE is a feature selection technique that recursively

removes the least important features from the dataset until a desired
number of features is reached. Weka provides various feature selection
techniques, including RFE, Correlation-based Feature Selection (CFS), and
more.

8. Which of the following is a disadvantage of the K-nearest

neighbors (KNN) algorithm in Weka?

a. It is computationally expensive
b. It requires large amounts of training data
c. It is sensitive to irrelevant features
d. It cannot handle categorical data

Answer: a. It is computationally expensive.

Explanation: KNN algorithm is computationally expensive as it requires

calculating the distance between the query point and all the training data
points, which can be time-consuming for large datasets.

9. Which of the following is an ensemble learning algorithm in

Weka?

a. Linear Regression
b. Naive Bayes
c. Random Forest
d. K-means

Answer: c. Random Forest.

Explanation: Random Forest is an ensemble learning algorithm that

combines multiple decision trees to make more accurate predictions.
Weka provides various ensemble learning algorithms, including Bagging,
Boosting, and more.

10. Which of the following is NOT a type of neural network

available in Weka?

a. Multilayer Perceptron (MLP)

b. Radial Basis Function (RBF)
c. Convolutional Neural Network (CNN)
d. Decision Tree

Answer: d. Decision Tree.

Explanation: Decision Tree is not a type of neural network, but
a machine learning algorithm used for classification and regression tasks.

11. Which of the following is a supervised learning algorithm in

Weka?

a. K-means
b. DBSCAN
c. Naive Bayes
d. EM

Answer: c. Naive Bayes.

Explanation: Naive Bayes is a supervised learning algorithm used for

classification tasks, where the target variable is known.

12. Which of the following is NOT a data preprocessing technique

in Weka?

a. Data normalization
b. Data imputation
c. Data visualization
d. Data discretization

Answer: c. Data visualization.

Explanation: Weka does not have built-in data visualization tools, but it
does provide various data preprocessing techniques such as data
normalization, data imputation, and data discretization.

13. Which of the following is a feature extraction technique in

Weka?
a. Principal Component Analysis (PCA)
b. Recursive Feature Elimination (RFE)
c. Correlation-based Feature Selection (CFS)
d. K-nearest neighbors (KNN)

Answer: a. Principal Component Analysis (PCA).

Explanation: PCA is a feature extraction technique that transforms the

original features into a smaller set of uncorrelated features that explain
most of the variance in the data. Weka provides various feature
extraction techniques, including PCA, Linear Discriminant Analysis (LDA),
and more.

14. Which of the following is NOT a type of attribute in Weka?

a. Numeric
b. Nominal
c. Binary
d. Sequential

Answer: d. Sequential.

Explanation: Sequential is not a type of attribute in Weka, but a concept

used in time series analysis to represent data that is ordered in time.

15. Which of the following is NOT a data mining task in Weka?

a. Classification
b. Clustering
c. Association Rule Mining
d. Data Visualization

Answer: d. Data Visualization.

Explanation: Data Visualization is not a data mining task in Weka, but a
technique used to represent data in a visual form for better understanding
and insights.

16. Which of the following is a rule-based learning algorithm in

Weka?

a. Random Forest
b. J48
c. K-means
d. DBSCAN

Answer: b. J48.

Explanation: J48 is a decision tree algorithm based on the C4.5

algorithm, which builds a tree of if-then rules to make
predictions. Weka provides various rule-based learning algorithms,
including ZeroR, OneR, and more.

17. Which of the following is a data imbalance problem in Weka?

a. Overfitting
b. Underfitting
c. Missing values
d. Class imbalance

Answer: d. Class imbalance.

Explanation: Class imbalance is a data imbalance problem that occurs

when one class in the dataset has significantly fewer samples than the
other classes, leading to biased predictions. Weka provides various
techniques to handle class imbalance, including resampling, cost-sensitive
learning, and more.

18. Which of the following is NOT a type of regression algorithm in

Weka?

a. Linear Regression
b. Polynomial Regression
c. Logistic Regression
d. K-means

Answer: d. K-means.

Explanation: K-means is not a regression algorithm, but a clustering

algorithm used to group similar data points together.

19. Which of the following is NOT a type of cross-validation in

Weka?

a. K-fold cross-validation
b. Leave-one-out cross validation
c. Stratified cross-validation
d. Naive Bayes cross-validation

Answer: d. Naive Bayes cross-validation.

Explanation: Naive Bayes cross-validation is not a type of cross-

validation in Weka, but a technique used to evaluate the performance of
Naive Bayes classifier on a dataset.

20. Which of the following is a data discretization technique in

Weka?
a. Equal width discretization
b. Normalization
c. Principal Component Analysis (PCA)
d. Recursive Feature Elimination (RFE)

Answer: a. Equal width discretization.

Explanation: Equal width discretization is a data discretization technique

that divides the range of values into equal-width intervals and assigns a
discrete value to each interval. Weka provides various data discretization
techniques, including equal frequency discretization, unsupervised
discretization, and more.

21. Which of the following is NOT a type of ensemble learning

algorithm in Weka?

a. Bagging
b. Boosting
c. Random Forest
d. K-means

Answer: d. K-means.

Explanation: K-means is not an ensemble learning algorithm, but a

clustering algorithm used to group similar data points together. Weka
provides various ensemble learning algorithms, including Bagging,
Boosting, and Random Forest.

22. Which of the following is a dimensionality reduction technique

in Weka?
a. Principal Component Analysis (PCA)
b. Recursive Feature Elimination (RFE)
c. Correlation-based Feature Selection (CFS)
d. K-nearest neighbors (KNN)

Answer: a. Principal Component Analysis (PCA).

Explanation: PCA is a dimensionality reduction technique that transforms

the original features into a smaller set of uncorrelated features that
explain most of the variance in the data. Weka provides various
dimensionality reduction techniques, including PCA, Linear Discriminant
Analysis (LDA), and more.

23. Which of the following is a non-parametric classification

algorithm in Weka?

a. Logistic Regression
b. Decision Tree
c. Naive Bayes
d. k-Nearest Neighbors (k-NN)

Answer: d. k-Nearest Neighbors (k-NN).

Explanation: k-NN is a non-parametric classification algorithm that uses

the k-nearest neighbors to classify a new instance based on the majority
class of its neighbors. Weka provides various non-parametric classification
algorithms, including k-NN, Random Forest, and more.

24. Which of the following is a neural network activation function

available in Weka?
a. Sigmoid
b. ReLU
c. Tanh
d. All of the above

Answer: d. All of the above.

Explanation: Weka provides various neural network activation functions,

including Sigmoid, ReLU, Tanh, and more.

25. Which of the following is a clustering evaluation metric

in Weka?

a. Accuracy
b. F-measure
c. Silhouette coefficient
d. Precision

Answer: c. Silhouette coefficient.

Explanation: Silhouette coefficient is a clustering evaluation metric that

measures the quality of clustering by comparing the distance between the
data points within the same cluster and the distance between the data
points of different clusters. Weka provides various clustering evaluation
metrics, including Silhouette coefficient, Sum of Squared Error (SSE), and
more.

26. Which of the following is a data imbalance handling technique

in Weka?

a. Bagging
b. SMOTE
c. Random Forest
d. Boosting

Answer: b. SMOTE.

Explanation: SMOTE (Synthetic Minority Over-sampling Technique) is a

data imbalance handling technique that creates synthetic samples of the
minority class by interpolating between the existing minority class
samples. Weka provides various data imbalance handling techniques,
including SMOTE, ADASYN, and more.

27. Which of the following is a regression algorithm in Weka?

a. Decision Tree
b. k-Nearest Neighbors (k-NN)
c. Linear Regression
d. Support Vector Machine (SVM)

Answer: c. Linear Regression.

Explanation: Linear Regression is a regression algorithm that models the

relationship between the dependent variable and one or more
independent variables by fitting a linear equation to the data. Weka
provides various regression algorithms, including Linear Regression,
Multilayer Perceptron (MLP), and more.

28. Which of the following is a data normalization technique in

Weka?

a. Min-max normalization
b. Recursive Feature Elimination (RFE)
c. Correlation-based Feature Selection (CFS)
d. Principal Component Analysis (PCA)

Answer: a. Min-max normalization.

Explanation: Min-max normalization is a data normalization technique

that scales the data to a fixed range of values between 0 and 1. Weka
provides various data normalization techniques, including z-score
normalization, decimal scaling, and more.

29. Which of the following is NOT a type of attribute selection in

Weka?

a. Wrapper Subset Evaluator

b. Filter Subset Evaluator
c. Correlation-based Feature Selection (CFS)
d. Boosting

Answer: d. Boosting.

Explanation: Boosting is not a type of attribute selection, but an

ensemble learning algorithm used for classification and regression tasks.
Weka provides various attribute selection techniques, including Wrapper
Subset Evaluator, Filter Subset Evaluator, and more.

30. Which of the following is a classification algorithm in Weka?

a. Support Vector Machine (SVM)

b. k-Means
c. Hierarchical Clustering
d. PCA

Answer: a. Support Vector Machine (SVM).

Explanation: SVM is a classification algorithm that separates the data
into different classes by finding the hyperplane that maximally separates
the classes. Weka provides various classification algorithms, including
SVM, Naive Bayes, and more.

31. Which of the following is NOT a type of ensemble learning in

Weka?

a. AdaBoost
b. Bagging
c. Boosting
d. Random Forest

Answer: d. Random Forest.

Explanation: Random Forest is not a type of ensemble learning, but a

specific ensemble learning algorithm that uses decision trees as the base
classifiers. Weka provides various ensemble learning algorithms, including
AdaBoost, Bagging, and Boosting.

32. Which of the following is a distance metric used in k-Nearest

Neighbors (k-NN) algorithm in Weka?

a. Euclidean distance
b. Manhattan distance
c. Mahalanobis distance
d. All of the above

Answer: d. All of the above.

Explanation: Weka provides various distance metrics used in k-Nearest
Neighbors (k-NN) algorithm, including Euclidean distance, Manhattan
distance, and Mahalanobis distance.

33. Which of the following is a missing value handling technique

in Weka?

a. Mean imputation
b. Median imputation
c. Mode imputation
d. All of the above

Answer: d. All of the above.

Explanation: Weka provides various missing value handling techniques,

including mean imputation, median imputation, mode imputation, and
more.

34. Which of the following is a kernel function used in Support

Vector Machine (SVM) algorithm in Weka?

a. Linear kernel
b. Polynomial kernel
c. Gaussian kernel
d. All of the above

Answer: d. All of the above.

Explanation: Weka provides various kernel functions used in Support

Vector Machine (SVM) algorithm, including linear kernel, polynomial
kernel, and Gaussian kernel.

35. Which of the following is a rule-based classifier in Weka?

a. Decision Tree
b. Naive Bayes
c. ZeroR
d. JRip

Answer: d. JRip.

Explanation: JRip is a rule-based classifier in Weka that constructs a set

of rules from the data that classify the instances based on their attribute
values. Weka provides various rule-based classifiers, including JRip, PART,
and more.

36. Which of the following is NOT a type of clustering algorithm in

Weka?

a. k-Means
b. Hierarchical Clustering
c. DBSCAN
d. Linear Regression

Answer: d. Linear Regression.

Explanation: Linear Regression is not a type of clustering algorithm, but

a regression algorithm used to model the relationship between the
dependent variable and one or more independent variables. Weka
provides various clustering algorithms, including k-Means, Hierarchical
Clustering, DBSCAN, and more.

37. Which of the following is a feature selection technique that

selects a subset of features based on their correlation with the
class attribute in Weka?
a. Wrapper Subset Evaluator
b. Filter Subset Evaluator
c. Correlation-based Feature Selection (CFS)
d. Principal Component Analysis (PCA)

Answer: c. Correlation-based Feature Selection (CFS).

Explanation: CFS is a feature selection technique in Weka that selects a

subset of features based on their correlation with the class attribute.
Weka provides various feature selection techniques, including Wrapper
Subset Evaluator, Filter Subset Evaluator, and more.

38. Which of the following is a rule induction algorithm in Weka?

a. k-NN
b. Apriori
c. Random Forest
d. JRip

Answer: d. JRip.

Explanation: JRip is a rule induction algorithm in Weka that constructs a

set of rules from the data that classify the instances based on their
attribute values. Weka provides various rule induction algorithms,
including JRip, PART, and more.

39. Which of the following is a type of ensemble learning

technique in Weka?

a. Decision Tree
b. Naive Bayes
c. Bagging
d. k-NN

Answer: c. Bagging.

Explanation: Bagging is a type of ensemble learning technique in Weka

that constructs multiple models from different subsets of the data and
combines them to improve the predictive performance. Weka provides
various ensemble learning techniques, including Bagging, Boosting, and
more.

40. Which of the following is a dimensionality reduction technique

that maximizes the margin between classes in Weka?

a. PCA
b. LDA
c. ICA
d. SVM

Answer: b. LDA.

Explanation: LDA is a dimensionality reduction technique in Weka that

maximizes the margin between classes by finding the linear combinations
of features that best separate the classes. SVM is a classification
algorithm that can use LDA as a preprocessing step. Weka provides
various dimensionality reduction techniques, including PCA, LDA, and
more.

41. Which of the following is a type of classification algorithm in

Weka that assigns a class label based on the most common class
in the training data?
a. Decision Tree
b. Naive Bayes
c. k-NN
d. ZeroR

Answer: d. ZeroR.

Explanation: ZeroR is a type of classification algorithm in Weka that

assigns a class label based on the most common class in the training
data. It is a simple baseline classifier used to evaluate the predictive
performance of more complex classifiers. Weka provides various
classification algorithms, including Decision Tree, Naive Bayes, k-NN, and
more.

42. Which of the following is a type of ensemble learning

technique that combines multiple models using weighted voting in
Weka?

a. Bagging
b. Boosting
c. Stacking
d. Random Forest

Answer: c. Stacking.

Explanation: Stacking is a type of ensemble learning technique in Weka

that combines multiple models using weighted voting. The output of the
base models is used as input to a meta-model that learns how to combine
them to make the final prediction. Weka provides various ensemble
learning techniques, including Bagging, Boosting, Random Forest, and
more.
43. Which of the following is a feature selection technique that
evaluates the subsets of features using a learning algorithm in
Weka?

a. Wrapper Subset Evaluator

b. Filter Subset Evaluator
c. Correlation-based Feature Selection (CFS)
d. Principal Component Analysis (PCA)

Answer: a. Wrapper Subset Evaluator.

Explanation: Wrapper Subset Evaluator is a feature selection technique

in Weka that evaluates the subsets of features using a learning algorithm.
It searches through the space of possible feature subsets and selects the
one that achieves the best performance on the validation set. Weka
provides various feature selection techniques, including Wrapper Subset
Evaluator, Filter Subset Evaluator, and more

44. Which of the following is a clustering algorithm in Weka that

uses a density-based approach?

a. k-Means
b. EM
c. DBSCAN
d. SOM

Answer: c. DBSCAN.

Explanation: DBSCAN is a clustering algorithm in Weka that uses a

density-based approach to group the instances into clusters. It works by
identifying the dense regions of the data and connecting them into
clusters. Weka provides various clustering algorithms, including k-Means,
EM, DBSCAN, SOM, and more.

45. Which of the following is a method for handling missing values

in Weka that uses the available data to estimate the missing
values?

a. Mean Imputation
b. Mode Imputation
c. Median Imputation
d. k-NN Imputation

Answer: d. k-NN Imputation.

Explanation: k-NN Imputation is a method for handling missing values in

Weka that uses the available data to estimate the missing values. It
works by finding the k nearest instances to the instance with missing
values and using their attribute values to estimate the missing values.
Weka provides various methods for handling missing values, including
Mean Imputation, Mode Imputation, Median Imputation, k-NN Imputation,
and more.

46. Which of the following is a type of ensemble learning

technique in Weka that combines multiple models using a
weighted sum of their predictions?

a. Bagging
b. Boosting
c. Stacking
d. Random Forest

Answer: b. Boosting.
Explanation: Boosting is a type of ensemble learning technique in Weka
that combines multiple models using a weighted sum of their predictions.
It works by iteratively reweighting the instances based on their
classification errors and building a new model on the reweighted data.
Weka provides various ensemble learning techniques, including Bagging,
Boosting, Random Forest, Stacking, and more.

47. Which of the following is a type of classification algorithm in

Weka that models the joint probability distribution of the features
and the class?

a. Naive Bayes
b. k-NN
c. Decision Tree
d. SVM

Answer: a. Naive Bayes.

Explanation: Naive Bayes is a type of classification algorithm in Weka

that models the joint probability distribution of the features and the class
using Bayes’ theorem and the assumption of independence between the
features. Weka provides various classification algorithms, including Naive
Bayes, k-NN, Decision Tree, SVM, and more.

48. Which of the following is a clustering algorithm in Weka that

uses a probabilistic approach?

a. k-Means
b. EM
c. DBSCAN
d. SOM
Answer: b. EM.

Explanation: EM is a clustering algorithm in Weka that uses a

probabilistic approach to group the instances into clusters. It works by
modeling the data as a mixture of probability distributions and estimating
the parameters of the distributions using the Expectation-Maximization
algorithm. Weka provides various clustering algorithms, including k-
Means, EM, DBSCAN, SOM, and more.

49. Which of the following is a feature selection technique that

evaluates the subsets of features based on their predictive power
and selects the subset that gives the best performance?

a. Filter
b. Wrapper
c. Embedded
d. Correlation-based

Answer: b. Wrapper.

Explanation: Wrapper is a feature selection technique in Weka that

evaluates the subsets of features based on their predictive power and
selects the subset that gives the best performance. It works by using a
learning algorithm to train and evaluate the model on each subset of
features and selecting the subset that gives the best performance. Weka
provides various feature selection techniques, including Filter, Wrapper,
Embedded, Correlation-based, and more.

50. Which of the following is a type of dimensionality reduction

technique in Weka that maps the high-dimensional data to a
lower-dimensional space while preserving the pairwise distances
between the instances?
a. Principal Component Analysis (PCA)
b. Linear Discriminant Analysis (LDA)
c. t-SNE
d. Isomap

Answer: d. Isomap.

Explanation: Isomap is a type of dimensionality reduction technique in

Weka that maps the high-dimensional data to a lower-dimensional space
while preserving the pairwise distances between the instances. It works
by constructing a neighborhood graph of the instances and estimating the
geodesic distances between them using a shortest path algorithm. Weka
provides various dimensionality reduction techniques, including PCA, LDA,
t-SNE, Isomap, and more.

51. Which of the following is a type of rule-based classification

algorithm in Weka that builds a set of rules from the data?

a. OneR
b. ZeroR
c. JRip
d. Random Tree

Answer: c. JRip.

Explanation: JRip is a type of rule-based classification algorithm

in Weka that builds a set of rules from the data. It works by iteratively
adding rules to the rule set based on the accuracy and coverage of the
rules. Weka provides various classification algorithms, including OneR,
ZeroR, JRip, Random Tree, and more.
52. Which of the following is a type of clustering algorithm in
Weka that uses a hierarchical approach to group the instances
into clusters?

a. k-Means
b. EM
c. DBSCAN
d. Hierarchical

Answer: d. Hierarchical.

Explanation: Hierarchical is a type of clustering algorithm in Weka that

uses a hierarchical approach to group the instances into clusters. It works
by recursively merging the most similar clusters based on a distance
metric until all the instances are in a single cluster. Weka provides various
clustering algorithms, including k-Means, EM, DBSCAN, Hierarchical, and
more.

53. Which of the following is a type of feature selection technique

in Weka that selects the features based on their correlation with
the class and removes the redundant features?

a. Filter
b. Wrapper
c. Embedded
d. Correlation-based

Answer: d. Correlation-based.

Explanation: Correlation-based is a feature selection technique in Weka

that selects the features based on their correlation with the class and
removes the redundant features. It works by computing the correlation
between each feature and the class and selecting the subset of features
with the highest correlation. Weka provides various feature selection
techniques, including Filter, Wrapper, Embedded, Correlation-based, and
more.

54. Which of the following is a type of clustering algorithm in

Weka that uses a grid-based approach to group the instances into
clusters?

a. k-Means
b. EM
c. DBSCAN
d. CLIQUE

Answer: d. CLIQUE.

Explanation: CLIQUE is a type of clustering algorithm in Weka that uses

a grid-based approach to group the instances into clusters. It works by
partitioning the data into overlapping grids and identifying the dense
regions of the data within each grid. Weka provides various clustering
algorithms, including k-Means, EM, DBSCAN, CLIQUE, and more.

55. Which of the following is a type of classification algorithm in

Weka that builds a decision tree from the data?

a. Naive Bayes
b. k-NN
c. J48
d. Random Forest

Answer: c. J48.

Weka Tool Installation Guide
No ratings yet
Weka Tool Installation Guide
7 pages
Datawarehouse Pract 2
No ratings yet
Datawarehouse Pract 2
7 pages
Data Warehousing Lab Excercise
No ratings yet
Data Warehousing Lab Excercise
45 pages
DWDM Lab Manual 2024-2025
No ratings yet
DWDM Lab Manual 2024-2025
96 pages
EXP No 1
No ratings yet
EXP No 1
7 pages
DW Lab Manual
No ratings yet
DW Lab Manual
44 pages
32013105-BDA LabManual
No ratings yet
32013105-BDA LabManual
122 pages
Lab Manual (2024)
No ratings yet
Lab Manual (2024)
56 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
WEKA Toolkit: Machine Learning Guide
No ratings yet
WEKA Toolkit: Machine Learning Guide
8 pages
Aiml Manual
No ratings yet
Aiml Manual
27 pages
Data Mining Lab Manual for CSE
No ratings yet
Data Mining Lab Manual for CSE
50 pages
Mooc On Weka
No ratings yet
Mooc On Weka
59 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
WEKA Lab Record
No ratings yet
WEKA Lab Record
69 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
Data Warehousing and Data Mining Lab Manual
100% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
No ratings yet
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
27 pages
WEKA Practical Protocol
No ratings yet
WEKA Practical Protocol
40 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
55 pages
DWDM Lab Manual 2022-2023
No ratings yet
DWDM Lab Manual 2022-2023
87 pages
Laboratory Manual On: Data Mining
No ratings yet
Laboratory Manual On: Data Mining
41 pages
Weka Lab Manual
No ratings yet
Weka Lab Manual
49 pages
Machine Learning Tools: Weka & KNIME
No ratings yet
Machine Learning Tools: Weka & KNIME
88 pages
Weka (Software)
No ratings yet
Weka (Software)
4 pages
DWDM File-Final Ver3.pdf 20241230 172003 0000
No ratings yet
DWDM File-Final Ver3.pdf 20241230 172003 0000
54 pages
Weka Tutorial
No ratings yet
Weka Tutorial
32 pages
Week 1
No ratings yet
Week 1
12 pages
Weka (Software)
No ratings yet
Weka (Software)
4 pages
Lab 02
No ratings yet
Lab 02
4 pages
DWM1
No ratings yet
DWM1
19 pages
Experiment WEKA
No ratings yet
Experiment WEKA
16 pages
DWM Practical ..
No ratings yet
DWM Practical ..
41 pages
9348 11568 1 PB Published Paper
No ratings yet
9348 11568 1 PB Published Paper
12 pages
WEKA: ML Tool for Data Scientists
No ratings yet
WEKA: ML Tool for Data Scientists
23 pages
Data Warehousing and Data Mining Lab Manual
0% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
50 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
Bioinformatics: Applications Note
No ratings yet
Bioinformatics: Applications Note
3 pages
DWDM WEEK1&2
No ratings yet
DWDM WEEK1&2
13 pages
Ijiset V2 I2 63 PDF
No ratings yet
Ijiset V2 I2 63 PDF
9 pages
Data Warehousing
No ratings yet
Data Warehousing
54 pages
Data Mining Example (Using Weka)
50% (2)
Data Mining Example (Using Weka)
59 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
Data Warehouse Lab Manual
No ratings yet
Data Warehouse Lab Manual
60 pages
Weka Tool Guide for Data Analysts
No ratings yet
Weka Tool Guide for Data Analysts
6 pages
Lab Manual - DM
No ratings yet
Lab Manual - DM
56 pages
WEKA Tool & Data Mining Lab Guide
No ratings yet
WEKA Tool & Data Mining Lab Guide
29 pages
DMW 1 2pdf
No ratings yet
DMW 1 2pdf
11 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
Week 1
No ratings yet
Week 1
4 pages
MCSL-223 Section 2 Data Mining Lab
No ratings yet
MCSL-223 Section 2 Data Mining Lab
55 pages
Deepak Dmbi File
No ratings yet
Deepak Dmbi File
40 pages
AI-43 Data Mining
No ratings yet
AI-43 Data Mining
96 pages
WEKA Intro
No ratings yet
WEKA Intro
17 pages
Data Mining and Data Visualization Lab Manual 303108304
No ratings yet
Data Mining and Data Visualization Lab Manual 303108304
43 pages
Data Werehousing Lab Manual
No ratings yet
Data Werehousing Lab Manual
63 pages
Lab Updated - Merged
No ratings yet
Lab Updated - Merged
49 pages
Adams: Real Dynamics For Functional Virtual Prototyping
No ratings yet
Adams: Real Dynamics For Functional Virtual Prototyping
2 pages
Synchro Weld
No ratings yet
Synchro Weld
12 pages
CCS 373 Chapter 1
No ratings yet
CCS 373 Chapter 1
20 pages
Comparing Secure Messaging Protocols Signal Vs WhatsApp
No ratings yet
Comparing Secure Messaging Protocols Signal Vs WhatsApp
4 pages
Doctor Who Security and Encryption FAQ - Revision 22.3
100% (8)
Doctor Who Security and Encryption FAQ - Revision 22.3
36 pages
Reinforced Concrete Design 9th Edition PDF - Google Search
0% (2)
Reinforced Concrete Design 9th Edition PDF - Google Search
2 pages
Cbse - Department of Skill Education Curriculum For Session 2023-2024
No ratings yet
Cbse - Department of Skill Education Curriculum For Session 2023-2024
20 pages
Dynamics 365 Finance and Operations Environment Planning Sample
100% (1)
Dynamics 365 Finance and Operations Environment Planning Sample
11 pages
Digital Marketing PPT Slides
83% (6)
Digital Marketing PPT Slides
18 pages
Forrester TechTide
No ratings yet
Forrester TechTide
36 pages
Program 1 Assignment Kit PDF
100% (1)
Program 1 Assignment Kit PDF
14 pages
Medical Shop Billing System Code
No ratings yet
Medical Shop Billing System Code
33 pages
Ijmet 08 10 013
No ratings yet
Ijmet 08 10 013
7 pages
Release of QRadar 7.5.0 Update Package 3 SFS (7.5.0-QRADAR-QRSIEM-2021.6.3.20220829221022)
No ratings yet
Release of QRadar 7.5.0 Update Package 3 SFS (7.5.0-QRADAR-QRSIEM-2021.6.3.20220829221022)
6 pages
MagicInfo Guide
No ratings yet
MagicInfo Guide
32 pages
Dot
No ratings yet
Dot
2 pages
Computer MCQS General Knowledge
50% (2)
Computer MCQS General Knowledge
2 pages
Source Code Documentation
100% (2)
Source Code Documentation
338 pages
Introduction To Computer Programming
No ratings yet
Introduction To Computer Programming
36 pages
DICOM Guide for Codonics Printers
No ratings yet
DICOM Guide for Codonics Printers
19 pages
Retroarch New Playlist - ps1
No ratings yet
Retroarch New Playlist - ps1
5 pages
Fake News Detection - Report
100% (1)
Fake News Detection - Report
59 pages
R - KES3 Define Characteristic Hierarchy
No ratings yet
R - KES3 Define Characteristic Hierarchy
21 pages
Database Assignment Database Assignment
No ratings yet
Database Assignment Database Assignment
73 pages
Dhiraj Dixit: Personal Information
No ratings yet
Dhiraj Dixit: Personal Information
5 pages
Introduction To Return
No ratings yet
Introduction To Return
3 pages
Resource Related Billing Document
No ratings yet
Resource Related Billing Document
9 pages
Userman COM
No ratings yet
Userman COM
32 pages
Environment Variables For OPNET
No ratings yet
Environment Variables For OPNET
2 pages
3 CPE 413 Assembly Lang Instructions
No ratings yet
3 CPE 413 Assembly Lang Instructions
104 pages