DATAWARE HOUSE TOOLS
ParAccel (Actian)
Cloudera
Talend
Query surge
Amazon Redshift
Teradata
Oracle
TabLeau
Page 1.1
Open Source Data Mining Tools
WEKA
Orange
KNIME
R-Programming
Rapid Miner
Apache Mahout
Tanagra
XL Miner
Page 1.2
Experiment 1: Installation of WEKA Tool
Aim: A. Investigation the Application interfaces of the Weka tool. Introduction:
Introduction
Weka (pronounced to rhyme with Mecca) is a workbench that contains a collection of
visualization tools and algorithms for data analysis and predictive modeling, together with
graphical user interfaces for easy access to these functions. The original non-Java version of
Weka was a Tcl/Tk front-end to (mostly third-party) modeling algorithms implemented in other
programming languages, plus data preprocessing utilities in C, and Make file-based system for
running machine learning experiments. This original version was primarily designed as a tool for
analyzing data from agricultural domains, but the more recent fully Java-based version (Weka 3),
for which development started in 1997, is now used in many different application areas, in
particular for educational purposes and research. Advantages of Weka include:
Free availability under the GNU General Public License.
Portability, since it is fully implemented in the Java programming language and thus runs
on almost any modern computing platform
A comprehensive collection of data preprocessing and modeling techniques
Ease of use due to its graphical user interfaces
Description:
Open the program. Once the program has been loaded on the user‟s machine it is opened by
navigating to the programs start option and that will depend on the user‟s operating system.
Figure 1.1 is an example of the initial opening screen on a computer.
There are four options available on this initial screen:
Information Technology Page 1
Fig: 1.1 Weka GUI
1. Explorer - the graphical interface used to conduct experimentation on raw data After clicking
the Explorer button the weka explorer interface appears.
Fig: 1.2 Pre-processor
Information Technology Page 2
Information Technology Page 3
Inside the weka explorer window there are six tabs:
1. Preprocess- used to choose the data file to be used by the application.
Open File- allows for the user to select files residing on the local machine or recorded medium
Open URL- provides a mechanism to locate a file or data source from a different location
specified by the user
Open Database- allows the user to retrieve files or data from a database source provided by user
2. Classify- used to test and train different learning schemes on the preprocessed data file under
experimentation
Fig: 1.3 choosing Zero set from classify
Again there are several options to be selected inside of the classify tab. Test option gives the user
the choice of using four different test mode scenarios on the data set.
1. Use training set
2. Supplied training set
3. Cross validation
4. Split percentage
3. Cluster- used to apply different tools that identify clusters within the data file.
The Cluster tab opens the process that is used to identify commonalties or clusters of occurrences
within the data set and produce information for the user to analyze.
Information Technology Page 4
4. Association- used to apply different rules to the data file that identify association within the
data. The associate tab opens a window to select the options for associations within the dataset.
Information Technology Page 5
5. Select attributes-used to apply different rules to reveal changes based on selected attributes
inclusion or exclusion from the experiment
6. Visualize- used to see what the various manipulation produced on the data set in a 2D format,
in scatter plot and bar graph output.
2. Experimenter - this option allows users to conduct different experimental variations on data
sets and perform statistical manipulation. The Weka Experiment Environment enables the user to
create, run, modify, and analyze experiments in a more convenient manner than is possible when
processing the schemes individually. For example, the user can create an experiment that runs
several schemes against a series of datasets and then analyze the results to determine if one of the
schemes is (statistically) better than the other schemes.
Fig: 1.6 Weka experiment
Results destination: ARFF file, CSV file, JDBC database.
Experiment type: Cross-validation (default), Train/Test Percentage Split (data randomized).
Iteration control: Number of repetitions, Data sets first/Algorithms first.
Algorithms: filters
Information Technology Page 6
3. Knowledge Flow -basically the same functionality as Explorer with drag and drop
functionality. The advantage of this option is that it supports incremental learning from previous
results
4. Simple CLI - provides users without a graphic interface option the ability to execute
commands from a terminal window.
b. Explore the default datasets in weka tool.
Click the “Open file…” button to open a data set and double click on the “data” directory.
Weka provides a number of small common machine learning datasets that you can use to practiceon.
Select the “iris.arff” file to load the Iris dataset.
Fig: 1.7 Different Data Sets in weka
References:
[1] Witten, I.H. and Frank, E. (2005) Data Mining: Practical machine learning tools and
techniques. 2nd edition Morgan Kaufmann, San Francisco.
[2] Ross Quinlan (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers,
San Mateo, CA.
[3] CVS–http://weka.sourceforge.net/wiki/index.php/CVS
[4] Weka Doc–http://weka.sourceforge.net/wekadoc/
Exercise:
1. Normalize the data using min-max normalization
Information Technology Page 7
Weka
1. Waikato Environment for Knowledge Analysis
2. OPEN SOURCE DATA MINING TOOLS
3. Weka logo, a bird of New Zealand
4. Weka contains a collection of visualization tools and
algorithms for data analysis and predictive modeling
5. Operating system Windows, macOS, Linux
6. Latest Version 3.9.6(Jan 2022)
7. https://www.filehorse.com/download-weka/
8. Open Source Data Mining Tools-Weka,orange, KNIME,
R- Programming, XL-Miner
1. Which of the following is true about Weka?
a. Weka is a data visualization tool.
b. Weka is a programming language.
c. Weka is a collection of machine learning algorithms.
d. Weka is used only for unsupervised learning.
Answer: c. Weka is a collection of machine learning algorithms.
Explanation: Weka stands for Waikato Environment for Knowledge
Analysis and is a collection of machine learning algorithms and data
preprocessing tools.
2. Which file format is commonly used to import data into Weka?
a. PDF
b. CSV
c. MP4
d. PNG
Answer: b. CSV.
Explanation: Weka can import data in various file formats, but CSV
(Comma Separated Values) is the most commonly used file format for
importing data.
3. Which of the following is NOT a type of data preprocessing
available in Weka?
a. Attribute selection
b. Data cleaning
c. Data normalization
d. Data visualization
Answer: d. Data visualization.
Explanation: Weka does not have built-in data visualization tools, but it
does provide various data preprocessing techniques such as attribute
selection, data cleaning, and data normalization.
4. Which algorithm is used for classification in Weka?
a. Naive Bayes
b. K-means
c. Random Forest
d. PCA
Answer: a. Naive Bayes.
Explanation: Weka provides a variety of classification algorithms,
including Naive Bayes, Decision Trees, Support Vector Machines, and
more.
5. Which of the following is NOT a clustering algorithm in Weka?
a. K-means
b. DBSCAN
c. EM
d. Linear Regression
Answer: d. Linear Regression.
Explanation: Linear Regression is not a clustering algorithm, but a
supervised learning algorithm used for regression tasks.
6. Which of the following is NOT a type of evaluation metric in
Weka?
a. Accuracy
b. Precision
c. Recall
d. Distance
Answer: d. Distance.
Explanation: Distance is not an evaluation metric in Weka, but a concept
used in clustering algorithms to measure the similarity or dissimilarity
between data points.
7. Which of the following is a feature selection technique in Weka?
a. Principal Component Analysis (PCA)
b. Recursive Feature Elimination (RFE)
c. K-means clustering
d. K-nearest neighbors (KNN)
Answer: b. Recursive Feature Elimination (RFE).
Explanation: RFE is a feature selection technique that recursively
removes the least important features from the dataset until a desired
number of features is reached. Weka provides various feature selection
techniques, including RFE, Correlation-based Feature Selection (CFS), and
more.
8. Which of the following is a disadvantage of the K-nearest
neighbors (KNN) algorithm in Weka?
a. It is computationally expensive
b. It requires large amounts of training data
c. It is sensitive to irrelevant features
d. It cannot handle categorical data
Answer: a. It is computationally expensive.
Explanation: KNN algorithm is computationally expensive as it requires
calculating the distance between the query point and all the training data
points, which can be time-consuming for large datasets.
9. Which of the following is an ensemble learning algorithm in
Weka?
a. Linear Regression
b. Naive Bayes
c. Random Forest
d. K-means
Answer: c. Random Forest.
Explanation: Random Forest is an ensemble learning algorithm that
combines multiple decision trees to make more accurate predictions.
Weka provides various ensemble learning algorithms, including Bagging,
Boosting, and more.
10. Which of the following is NOT a type of neural network
available in Weka?
a. Multilayer Perceptron (MLP)
b. Radial Basis Function (RBF)
c. Convolutional Neural Network (CNN)
d. Decision Tree
Answer: d. Decision Tree.
Explanation: Decision Tree is not a type of neural network, but
a machine learning algorithm used for classification and regression tasks.
11. Which of the following is a supervised learning algorithm in
Weka?
a. K-means
b. DBSCAN
c. Naive Bayes
d. EM
Answer: c. Naive Bayes.
Explanation: Naive Bayes is a supervised learning algorithm used for
classification tasks, where the target variable is known.
12. Which of the following is NOT a data preprocessing technique
in Weka?
a. Data normalization
b. Data imputation
c. Data visualization
d. Data discretization
Answer: c. Data visualization.
Explanation: Weka does not have built-in data visualization tools, but it
does provide various data preprocessing techniques such as data
normalization, data imputation, and data discretization.
13. Which of the following is a feature extraction technique in
Weka?
a. Principal Component Analysis (PCA)
b. Recursive Feature Elimination (RFE)
c. Correlation-based Feature Selection (CFS)
d. K-nearest neighbors (KNN)
Answer: a. Principal Component Analysis (PCA).
Explanation: PCA is a feature extraction technique that transforms the
original features into a smaller set of uncorrelated features that explain
most of the variance in the data. Weka provides various feature
extraction techniques, including PCA, Linear Discriminant Analysis (LDA),
and more.
14. Which of the following is NOT a type of attribute in Weka?
a. Numeric
b. Nominal
c. Binary
d. Sequential
Answer: d. Sequential.
Explanation: Sequential is not a type of attribute in Weka, but a concept
used in time series analysis to represent data that is ordered in time.
15. Which of the following is NOT a data mining task in Weka?
a. Classification
b. Clustering
c. Association Rule Mining
d. Data Visualization
Answer: d. Data Visualization.
Explanation: Data Visualization is not a data mining task in Weka, but a
technique used to represent data in a visual form for better understanding
and insights.
16. Which of the following is a rule-based learning algorithm in
Weka?
a. Random Forest
b. J48
c. K-means
d. DBSCAN
Answer: b. J48.
Explanation: J48 is a decision tree algorithm based on the C4.5
algorithm, which builds a tree of if-then rules to make
predictions. Weka provides various rule-based learning algorithms,
including ZeroR, OneR, and more.
17. Which of the following is a data imbalance problem in Weka?
a. Overfitting
b. Underfitting
c. Missing values
d. Class imbalance
Answer: d. Class imbalance.
Explanation: Class imbalance is a data imbalance problem that occurs
when one class in the dataset has significantly fewer samples than the
other classes, leading to biased predictions. Weka provides various
techniques to handle class imbalance, including resampling, cost-sensitive
learning, and more.
18. Which of the following is NOT a type of regression algorithm in
Weka?
a. Linear Regression
b. Polynomial Regression
c. Logistic Regression
d. K-means
Answer: d. K-means.
Explanation: K-means is not a regression algorithm, but a clustering
algorithm used to group similar data points together.
19. Which of the following is NOT a type of cross-validation in
Weka?
a. K-fold cross-validation
b. Leave-one-out cross validation
c. Stratified cross-validation
d. Naive Bayes cross-validation
Answer: d. Naive Bayes cross-validation.
Explanation: Naive Bayes cross-validation is not a type of cross-
validation in Weka, but a technique used to evaluate the performance of
Naive Bayes classifier on a dataset.
20. Which of the following is a data discretization technique in
Weka?
a. Equal width discretization
b. Normalization
c. Principal Component Analysis (PCA)
d. Recursive Feature Elimination (RFE)
Answer: a. Equal width discretization.
Explanation: Equal width discretization is a data discretization technique
that divides the range of values into equal-width intervals and assigns a
discrete value to each interval. Weka provides various data discretization
techniques, including equal frequency discretization, unsupervised
discretization, and more.
21. Which of the following is NOT a type of ensemble learning
algorithm in Weka?
a. Bagging
b. Boosting
c. Random Forest
d. K-means
Answer: d. K-means.
Explanation: K-means is not an ensemble learning algorithm, but a
clustering algorithm used to group similar data points together. Weka
provides various ensemble learning algorithms, including Bagging,
Boosting, and Random Forest.
22. Which of the following is a dimensionality reduction technique
in Weka?
a. Principal Component Analysis (PCA)
b. Recursive Feature Elimination (RFE)
c. Correlation-based Feature Selection (CFS)
d. K-nearest neighbors (KNN)
Answer: a. Principal Component Analysis (PCA).
Explanation: PCA is a dimensionality reduction technique that transforms
the original features into a smaller set of uncorrelated features that
explain most of the variance in the data. Weka provides various
dimensionality reduction techniques, including PCA, Linear Discriminant
Analysis (LDA), and more.
23. Which of the following is a non-parametric classification
algorithm in Weka?
a. Logistic Regression
b. Decision Tree
c. Naive Bayes
d. k-Nearest Neighbors (k-NN)
Answer: d. k-Nearest Neighbors (k-NN).
Explanation: k-NN is a non-parametric classification algorithm that uses
the k-nearest neighbors to classify a new instance based on the majority
class of its neighbors. Weka provides various non-parametric classification
algorithms, including k-NN, Random Forest, and more.
24. Which of the following is a neural network activation function
available in Weka?
a. Sigmoid
b. ReLU
c. Tanh
d. All of the above
Answer: d. All of the above.
Explanation: Weka provides various neural network activation functions,
including Sigmoid, ReLU, Tanh, and more.
25. Which of the following is a clustering evaluation metric
in Weka?
a. Accuracy
b. F-measure
c. Silhouette coefficient
d. Precision
Answer: c. Silhouette coefficient.
Explanation: Silhouette coefficient is a clustering evaluation metric that
measures the quality of clustering by comparing the distance between the
data points within the same cluster and the distance between the data
points of different clusters. Weka provides various clustering evaluation
metrics, including Silhouette coefficient, Sum of Squared Error (SSE), and
more.
26. Which of the following is a data imbalance handling technique
in Weka?
a. Bagging
b. SMOTE
c. Random Forest
d. Boosting
Answer: b. SMOTE.
Explanation: SMOTE (Synthetic Minority Over-sampling Technique) is a
data imbalance handling technique that creates synthetic samples of the
minority class by interpolating between the existing minority class
samples. Weka provides various data imbalance handling techniques,
including SMOTE, ADASYN, and more.
27. Which of the following is a regression algorithm in Weka?
a. Decision Tree
b. k-Nearest Neighbors (k-NN)
c. Linear Regression
d. Support Vector Machine (SVM)
Answer: c. Linear Regression.
Explanation: Linear Regression is a regression algorithm that models the
relationship between the dependent variable and one or more
independent variables by fitting a linear equation to the data. Weka
provides various regression algorithms, including Linear Regression,
Multilayer Perceptron (MLP), and more.
28. Which of the following is a data normalization technique in
Weka?
a. Min-max normalization
b. Recursive Feature Elimination (RFE)
c. Correlation-based Feature Selection (CFS)
d. Principal Component Analysis (PCA)
Answer: a. Min-max normalization.
Explanation: Min-max normalization is a data normalization technique
that scales the data to a fixed range of values between 0 and 1. Weka
provides various data normalization techniques, including z-score
normalization, decimal scaling, and more.
29. Which of the following is NOT a type of attribute selection in
Weka?
a. Wrapper Subset Evaluator
b. Filter Subset Evaluator
c. Correlation-based Feature Selection (CFS)
d. Boosting
Answer: d. Boosting.
Explanation: Boosting is not a type of attribute selection, but an
ensemble learning algorithm used for classification and regression tasks.
Weka provides various attribute selection techniques, including Wrapper
Subset Evaluator, Filter Subset Evaluator, and more.
30. Which of the following is a classification algorithm in Weka?
a. Support Vector Machine (SVM)
b. k-Means
c. Hierarchical Clustering
d. PCA
Answer: a. Support Vector Machine (SVM).
Explanation: SVM is a classification algorithm that separates the data
into different classes by finding the hyperplane that maximally separates
the classes. Weka provides various classification algorithms, including
SVM, Naive Bayes, and more.
31. Which of the following is NOT a type of ensemble learning in
Weka?
a. AdaBoost
b. Bagging
c. Boosting
d. Random Forest
Answer: d. Random Forest.
Explanation: Random Forest is not a type of ensemble learning, but a
specific ensemble learning algorithm that uses decision trees as the base
classifiers. Weka provides various ensemble learning algorithms, including
AdaBoost, Bagging, and Boosting.
32. Which of the following is a distance metric used in k-Nearest
Neighbors (k-NN) algorithm in Weka?
a. Euclidean distance
b. Manhattan distance
c. Mahalanobis distance
d. All of the above
Answer: d. All of the above.
Explanation: Weka provides various distance metrics used in k-Nearest
Neighbors (k-NN) algorithm, including Euclidean distance, Manhattan
distance, and Mahalanobis distance.
33. Which of the following is a missing value handling technique
in Weka?
a. Mean imputation
b. Median imputation
c. Mode imputation
d. All of the above
Answer: d. All of the above.
Explanation: Weka provides various missing value handling techniques,
including mean imputation, median imputation, mode imputation, and
more.
34. Which of the following is a kernel function used in Support
Vector Machine (SVM) algorithm in Weka?
a. Linear kernel
b. Polynomial kernel
c. Gaussian kernel
d. All of the above
Answer: d. All of the above.
Explanation: Weka provides various kernel functions used in Support
Vector Machine (SVM) algorithm, including linear kernel, polynomial
kernel, and Gaussian kernel.
35. Which of the following is a rule-based classifier in Weka?
a. Decision Tree
b. Naive Bayes
c. ZeroR
d. JRip
Answer: d. JRip.
Explanation: JRip is a rule-based classifier in Weka that constructs a set
of rules from the data that classify the instances based on their attribute
values. Weka provides various rule-based classifiers, including JRip, PART,
and more.
36. Which of the following is NOT a type of clustering algorithm in
Weka?
a. k-Means
b. Hierarchical Clustering
c. DBSCAN
d. Linear Regression
Answer: d. Linear Regression.
Explanation: Linear Regression is not a type of clustering algorithm, but
a regression algorithm used to model the relationship between the
dependent variable and one or more independent variables. Weka
provides various clustering algorithms, including k-Means, Hierarchical
Clustering, DBSCAN, and more.
37. Which of the following is a feature selection technique that
selects a subset of features based on their correlation with the
class attribute in Weka?
a. Wrapper Subset Evaluator
b. Filter Subset Evaluator
c. Correlation-based Feature Selection (CFS)
d. Principal Component Analysis (PCA)
Answer: c. Correlation-based Feature Selection (CFS).
Explanation: CFS is a feature selection technique in Weka that selects a
subset of features based on their correlation with the class attribute.
Weka provides various feature selection techniques, including Wrapper
Subset Evaluator, Filter Subset Evaluator, and more.
38. Which of the following is a rule induction algorithm in Weka?
a. k-NN
b. Apriori
c. Random Forest
d. JRip
Answer: d. JRip.
Explanation: JRip is a rule induction algorithm in Weka that constructs a
set of rules from the data that classify the instances based on their
attribute values. Weka provides various rule induction algorithms,
including JRip, PART, and more.
39. Which of the following is a type of ensemble learning
technique in Weka?
a. Decision Tree
b. Naive Bayes
c. Bagging
d. k-NN
Answer: c. Bagging.
Explanation: Bagging is a type of ensemble learning technique in Weka
that constructs multiple models from different subsets of the data and
combines them to improve the predictive performance. Weka provides
various ensemble learning techniques, including Bagging, Boosting, and
more.
40. Which of the following is a dimensionality reduction technique
that maximizes the margin between classes in Weka?
a. PCA
b. LDA
c. ICA
d. SVM
Answer: b. LDA.
Explanation: LDA is a dimensionality reduction technique in Weka that
maximizes the margin between classes by finding the linear combinations
of features that best separate the classes. SVM is a classification
algorithm that can use LDA as a preprocessing step. Weka provides
various dimensionality reduction techniques, including PCA, LDA, and
more.
41. Which of the following is a type of classification algorithm in
Weka that assigns a class label based on the most common class
in the training data?
a. Decision Tree
b. Naive Bayes
c. k-NN
d. ZeroR
Answer: d. ZeroR.
Explanation: ZeroR is a type of classification algorithm in Weka that
assigns a class label based on the most common class in the training
data. It is a simple baseline classifier used to evaluate the predictive
performance of more complex classifiers. Weka provides various
classification algorithms, including Decision Tree, Naive Bayes, k-NN, and
more.
42. Which of the following is a type of ensemble learning
technique that combines multiple models using weighted voting in
Weka?
a. Bagging
b. Boosting
c. Stacking
d. Random Forest
Answer: c. Stacking.
Explanation: Stacking is a type of ensemble learning technique in Weka
that combines multiple models using weighted voting. The output of the
base models is used as input to a meta-model that learns how to combine
them to make the final prediction. Weka provides various ensemble
learning techniques, including Bagging, Boosting, Random Forest, and
more.
43. Which of the following is a feature selection technique that
evaluates the subsets of features using a learning algorithm in
Weka?
a. Wrapper Subset Evaluator
b. Filter Subset Evaluator
c. Correlation-based Feature Selection (CFS)
d. Principal Component Analysis (PCA)
Answer: a. Wrapper Subset Evaluator.
Explanation: Wrapper Subset Evaluator is a feature selection technique
in Weka that evaluates the subsets of features using a learning algorithm.
It searches through the space of possible feature subsets and selects the
one that achieves the best performance on the validation set. Weka
provides various feature selection techniques, including Wrapper Subset
Evaluator, Filter Subset Evaluator, and more
44. Which of the following is a clustering algorithm in Weka that
uses a density-based approach?
a. k-Means
b. EM
c. DBSCAN
d. SOM
Answer: c. DBSCAN.
Explanation: DBSCAN is a clustering algorithm in Weka that uses a
density-based approach to group the instances into clusters. It works by
identifying the dense regions of the data and connecting them into
clusters. Weka provides various clustering algorithms, including k-Means,
EM, DBSCAN, SOM, and more.
45. Which of the following is a method for handling missing values
in Weka that uses the available data to estimate the missing
values?
a. Mean Imputation
b. Mode Imputation
c. Median Imputation
d. k-NN Imputation
Answer: d. k-NN Imputation.
Explanation: k-NN Imputation is a method for handling missing values in
Weka that uses the available data to estimate the missing values. It
works by finding the k nearest instances to the instance with missing
values and using their attribute values to estimate the missing values.
Weka provides various methods for handling missing values, including
Mean Imputation, Mode Imputation, Median Imputation, k-NN Imputation,
and more.
46. Which of the following is a type of ensemble learning
technique in Weka that combines multiple models using a
weighted sum of their predictions?
a. Bagging
b. Boosting
c. Stacking
d. Random Forest
Answer: b. Boosting.
Explanation: Boosting is a type of ensemble learning technique in Weka
that combines multiple models using a weighted sum of their predictions.
It works by iteratively reweighting the instances based on their
classification errors and building a new model on the reweighted data.
Weka provides various ensemble learning techniques, including Bagging,
Boosting, Random Forest, Stacking, and more.
47. Which of the following is a type of classification algorithm in
Weka that models the joint probability distribution of the features
and the class?
a. Naive Bayes
b. k-NN
c. Decision Tree
d. SVM
Answer: a. Naive Bayes.
Explanation: Naive Bayes is a type of classification algorithm in Weka
that models the joint probability distribution of the features and the class
using Bayes’ theorem and the assumption of independence between the
features. Weka provides various classification algorithms, including Naive
Bayes, k-NN, Decision Tree, SVM, and more.
48. Which of the following is a clustering algorithm in Weka that
uses a probabilistic approach?
a. k-Means
b. EM
c. DBSCAN
d. SOM
Answer: b. EM.
Explanation: EM is a clustering algorithm in Weka that uses a
probabilistic approach to group the instances into clusters. It works by
modeling the data as a mixture of probability distributions and estimating
the parameters of the distributions using the Expectation-Maximization
algorithm. Weka provides various clustering algorithms, including k-
Means, EM, DBSCAN, SOM, and more.
49. Which of the following is a feature selection technique that
evaluates the subsets of features based on their predictive power
and selects the subset that gives the best performance?
a. Filter
b. Wrapper
c. Embedded
d. Correlation-based
Answer: b. Wrapper.
Explanation: Wrapper is a feature selection technique in Weka that
evaluates the subsets of features based on their predictive power and
selects the subset that gives the best performance. It works by using a
learning algorithm to train and evaluate the model on each subset of
features and selecting the subset that gives the best performance. Weka
provides various feature selection techniques, including Filter, Wrapper,
Embedded, Correlation-based, and more.
50. Which of the following is a type of dimensionality reduction
technique in Weka that maps the high-dimensional data to a
lower-dimensional space while preserving the pairwise distances
between the instances?
a. Principal Component Analysis (PCA)
b. Linear Discriminant Analysis (LDA)
c. t-SNE
d. Isomap
Answer: d. Isomap.
Explanation: Isomap is a type of dimensionality reduction technique in
Weka that maps the high-dimensional data to a lower-dimensional space
while preserving the pairwise distances between the instances. It works
by constructing a neighborhood graph of the instances and estimating the
geodesic distances between them using a shortest path algorithm. Weka
provides various dimensionality reduction techniques, including PCA, LDA,
t-SNE, Isomap, and more.
51. Which of the following is a type of rule-based classification
algorithm in Weka that builds a set of rules from the data?
a. OneR
b. ZeroR
c. JRip
d. Random Tree
Answer: c. JRip.
Explanation: JRip is a type of rule-based classification algorithm
in Weka that builds a set of rules from the data. It works by iteratively
adding rules to the rule set based on the accuracy and coverage of the
rules. Weka provides various classification algorithms, including OneR,
ZeroR, JRip, Random Tree, and more.
52. Which of the following is a type of clustering algorithm in
Weka that uses a hierarchical approach to group the instances
into clusters?
a. k-Means
b. EM
c. DBSCAN
d. Hierarchical
Answer: d. Hierarchical.
Explanation: Hierarchical is a type of clustering algorithm in Weka that
uses a hierarchical approach to group the instances into clusters. It works
by recursively merging the most similar clusters based on a distance
metric until all the instances are in a single cluster. Weka provides various
clustering algorithms, including k-Means, EM, DBSCAN, Hierarchical, and
more.
53. Which of the following is a type of feature selection technique
in Weka that selects the features based on their correlation with
the class and removes the redundant features?
a. Filter
b. Wrapper
c. Embedded
d. Correlation-based
Answer: d. Correlation-based.
Explanation: Correlation-based is a feature selection technique in Weka
that selects the features based on their correlation with the class and
removes the redundant features. It works by computing the correlation
between each feature and the class and selecting the subset of features
with the highest correlation. Weka provides various feature selection
techniques, including Filter, Wrapper, Embedded, Correlation-based, and
more.
54. Which of the following is a type of clustering algorithm in
Weka that uses a grid-based approach to group the instances into
clusters?
a. k-Means
b. EM
c. DBSCAN
d. CLIQUE
Answer: d. CLIQUE.
Explanation: CLIQUE is a type of clustering algorithm in Weka that uses
a grid-based approach to group the instances into clusters. It works by
partitioning the data into overlapping grids and identifying the dense
regions of the data within each grid. Weka provides various clustering
algorithms, including k-Means, EM, DBSCAN, CLIQUE, and more.
55. Which of the following is a type of classification algorithm in
Weka that builds a decision tree from the data?
a. Naive Bayes
b. k-NN
c. J48
d. Random Forest
Answer: c. J48.