KEMBAR78
Question On Data Mining | PDF | Data Type | R (Programming Language)
0% found this document useful (0 votes)
43 views3 pages

Question On Data Mining

The document provides an overview of the R programming language, including its data types, structures, and functions for data manipulation. It also covers concepts in data mining, such as classification, regression, and clustering, along with machine learning techniques like supervised and unsupervised learning. Additionally, it introduces RStudio as an IDE for R and explains how to create user-defined functions.

Uploaded by

Surajit Acharya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views3 pages

Question On Data Mining

The document provides an overview of the R programming language, including its data types, structures, and functions for data manipulation. It also covers concepts in data mining, such as classification, regression, and clustering, along with machine learning techniques like supervised and unsupervised learning. Additionally, it introduces RStudio as an IDE for R and explains how to create user-defined functions.

Uploaded by

Surajit Acharya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

1. What is R?

R is a programming language and environment widely used for solving data science
problems and particularly designed for statistical

2. List and define some basic data types in R.


There are a few data types in R, including:
Numeric—decimal numbers.
Integer—whole numbers.
Character—a letter,number, or symbol, or any combination of them, enclosed in
regular or single quotation marks.
Factor—categories from a predefined set of possible values, often with an intrinsic
order.
Logical—the Boolean values TRUE and FALSE, represented under the hood as 1 and 0,
respectively.

3. List and define some basic data structures in R.


Vector => a one-dimensional data structure used for storing values of the same data
type.
List => a multi-dimensional data structure used for storing values of any data type
and/or other data structures.
Matrix => a two-dimensional data structure used for storing values of the same data
type.
Data frame => a two-dimensional data structure used for storing values of any data
type, but each column must store values of the same data type.

4. How to import data in R?


The base R provides essential functions for importing data:
read.table()—the most general function of the base R for importing data, takes in
tabular data with any kind of field separators, including specific ones, such as |.
read.csv()—comma-separated values (CSV) files with . as the decimal separator.

5. What is a package in R, and how do you install and load packages?


To install an R package directly from CRAN, we need to pass the package name
enclosed in quotation marks to the install.packages()
To load an installed R package in the working R environment, we can use either
library() or require() functions.

6. How do you add a new column to a data frame in R?

Using the $ symbol:


df <- data.frame(col_1=10:13, col_2=c("a", "b", "c", "d"))
print(df)

df$col_3 <- c(5, 1, 18, 16)


print(df)

Using square brackets:


df <- data.frame(col_1=10:13, col_2=c("a", "b", "c", "d"))
print(df)

df["col_3"] <- c(5, 1, 18, 16)


print(df)

Using the cbind() function:


df <- data.frame(col_1=10:13, col_2=c("a", "b", "c", "d"))
print(df)

df <- cbind(df, col_3=c(5, 1, 18, 16))


print(df)
7. What is RStudio?
RStudio is an open-source IDE (integrated development environment) that is widely
used as a graphical front-end for working with the R programming language starting
from version 3.0.1. It has many helpful features that make it very popular among R
users:
User-friendly
Flexible
Multifunctional
Allows creating reusable scripts
Tracks operational history
Autocompletes the code
Offers detailed and comprehensive help on any object

8. How to create a user-defined function in R?


To create a user-defined function in R, we use the keyword function and the
following syntax:
function_name <- function(parameters){
function body
}

9.What is Data Mining?


Data mining refers to extracting or mining knowledge from large amounts of data. In
other words, Data mining is the science, art, and technology of discovering large
and complex bodies of data in order to discover useful patterns.

10. What are the different tasks of Data Mining?


The following activities are carried out during data mining:

Classification
Clustering
Association Rule Discovery
Sequential Pattern Discovery
Regression
Deviation Detection

11. What is Classification?


Classification is the processing of finding a set of models (or functions) that
describe and distinguish data classes or concepts, for the purpose of being able to
use the model to predict the class of objects whose class label is unknown.
Classification can be used for predicting the class label of data items.

12. What is Prediction?


Prediction can be viewed as the construction and use of a model to assess the class
of an unlabeled object, or to measure the value or value ranges of an attribute
that a given object is likely to have. In this interpretation, classification and
regression are the two major types of prediction problems where classification is
used to predict discrete or nominal values, while regression is used to predict
incessant or ordered values.

13. What is Decision tree?


A Decision tree is a classification scheme that generates a tree and a set of
rules, representing the model of different classes, from a given data set.

14. Explain Bayesian classification in Data Mining?


A Bayesian classifier is a statistical classifier. They can predict class
membership probabilities, for instance, the probability that a given sample belongs
to a particular class. Bayesian classification is created on the Bayes theorem. A
simple Bayesian classifier is known as the naive Bayesian classifier to be
comparable in performance with decision trees and neural network classifiers.

15. What do you understand by the term Cluster Analysis?


In the context of Data Mining, the term cluster analysis is an important type of
analysis that is used in market research, pattern recognition, data analysis, and
image processing, etc.

16. What is regression in Data mining?


Regression is used to evaluate or measure the change in one variable with respect
to another, establishing a linear relationship between them.

17. What is KMeans clustering?


The KMeans algorithm clusters data by trying to separate samples in n groups of
equal variance, minimizing a criterion known as the inertia or within-cluster sum-
of-squares. This algorithm requires the number of clusters to be specified. It
scales well to large number of samples and has been used across a large range of
application areas in many different fields.

18. What is supervised learning?


Supervised learning is the machine learning task of inferring a function from
labeled training data. The training data consist of a set of training examples. In
supervised learning, each example is a pair consisting of an input object
(typically a vector) and a desired output value (also called the supervisory
signal).

19. What is unsupervised learning?


Unsupervised learning is a type of machine learning algorithm used to draw
inferences from datasets consisting of input data without labeled responses. The
most common unsupervised learning method is cluster analysis, which is used for
exploratory data analysis to find hidden patterns or grouping in data.

You might also like