Questions :
1. What is the difference between supervised and
unsupervised learning ?
Supervised learning involves training a model on a labeled
dataset, which means that each training example is paired
with an output label. The goal is for the model to learn a
mapping from inputs to outputs so that it can accurately
predict the label of new, unseen data. This approach is
commonly used in applications such as spam detection,
image classification, and fraud detection. Algorithms used
in supervised learning include linear regression, decision
trees, support vector machines (SVM), and neural
networks. unsupervised learning deals with unlabeled
data. In this case, the model tries to find patterns,
groupings, or structures within the data without being
given any explicit output labels. The main objective is to
explore the underlying structure or distribution in the data.
Common use cases include customer segmentation,
anomaly detection, and market basket analysis. Algorithms
like k-means clustering, hierarchical clustering, and
principal component analysis (PCA) are typically used in
unsupervised learning.
2. What is a feature?
A feature refers to an individual measurable property or
characteristic of the data that is used to make predictions
or decisions. Features serve as the input variables that a
machine learning model uses to learn patterns from data.
For example, in a model designed to predict house prices,
features might include the size of the house, number of
bedrooms, location, and age of the house. These features
help the model understand the important aspects that
influence the price.
3. What is a label?
A label is the output or target value that a model is trained
to predict. It represents the correct answer or result for a
given set of input features in supervised learning. For
example, in a spam email detection system, the features
might include the content of the email, the sender, and the
subject line, while the label would indicate whether the
email is “spam” or “not spam.” Labels are essential in
supervised learning because they provide the model with
the correct output during training, allowing it to learn the
relationship between the input features and the desired
result.
4. Which library do we use to implement ML models in
Python?
Python offers several powerful libraries for implementing
machine learning models, each suited to different types of
tasks. Scikit-learn is one of the most popular libraries for
traditional machine learning. It provides easy-to-use tools
for classification, regression, clustering, and more, making
it ideal for beginners and practical for professionals. For
deep learning tasks, TensorFlow and PyTorch are the two
leading libraries. TensorFlow, developed by Google, is
widely used in production environments and supports
high-performance computing with GPU acceleration.
Keras, which runs on top of TensorFlow, provides a simpler
and more intuitive interface for building and training
neural networks. On the other hand, PyTorch, developed
by Facebook, is especially favored in research for its
flexibility and dynamic computation graphs. For tasks
involving structured or tabular data, gradient boosting
libraries like XGBoost and LightGBM are often used due to
their high accuracy and efficiency in handling large
datasets. Each of these libraries plays a vital role in the
machine learning ecosystem, catering to different needs
from simple models to complex deep learning
architectures.