Assignment
Course Code: CSE325
Course Title: Data Mining and Machine Learning
Submitted to
Name: Dr. Md Alamgir Kabir
Department CSE
Daffodil International University
Submitted by:
Name: Ahanaf Al Mashfi
Id: 222-15-6402
Sec:62_G
Daffodil International University
Submission Date: 05-04-2025
1. Data
Definition: Raw facts and figures that can be processed to extract information.
Example: A dataset containing sales records, including dates, amounts, and customer information.
2. Feature
Definition: An individual measurable property or characteristic of a phenomenon being observed.
Example: In a dataset of cars, features could include horsepower, weight, and fuel efficiency.
3. Label
Definition: The outcome variable that a machine learning model predicts.
Example: In a dataset predicting house prices, the label would be the actual price of the house.
4. Training Data
Definition: A subset of the dataset used to train the machine learning model.
Example: A portion of the data used to teach a model to recognize handwritten digits.
5. Testing Data
Definition: A separate subset of the dataset used to evaluate the performance of the model.
Example: The data not used during training to test how well the model predicts new, unseen data.
6. Algorithm
Definition: A set of rules or procedures for solving a problem or performing a task in machine learning.
Example: Decision trees, neural networks, and support vector machines are examples of algorithms.
7. Overfitting
Definition: A modeling error that occurs when a model learns the details and noise in the training data
to the extent that it performs poorly on new data.
Example: A model that achieves 95% accuracy on training data but only 60% on testing data might be
overfitting.
8. Underfitting
Definition: A situation where a model is too simple to capture the underlying patterns in the data.
Example: A linear regression model applied to a non-linear dataset that fails to provide an accurate
representation.
9. Cross-Validation
Definition: A technique for assessing how the results of a statistical analysis will generalize to an
independent dataset.
Example: K-fold cross-validation, where the dataset is split into 'k' subsets; the model is trained on 'k-
1' subsets and tested on 1 subset repeatedly.
10. Confusion Matrix
Definition: A table used to evaluate the performance of a classification model by comparing predicted
results with actual results.
Example: It shows True Positive, False Positive, True Negative, and False Negative counts.
11. Precision and Recall
Definitions:
Precision: The ratio of correctly predicted positive observations to the total predicted positives.
Recall: The ratio of correctly predicted positive observations to all actual positives. Example: In email
spam detection, precision measures how many emails identified as spam were actually spam, while
recall measures how many actual spam emails were correctly identified.
12. Supervised Learning
Definition: A type of machine learning where the model is trained on labeled data.
Example: Predicting house prices based on historical sales data, where past prices (labels) are
provided.
13. Unsupervised Learning
Definition: A type of machine learning where the model learns from unlabeled data to find hidden
patterns.
Example: Customer segmentation analysis where data on purchasing behavior is grouped without
predefined labels.