CT-562
MACHINE LEARNING
NED University of Engineering & Technology
1
COURSE OUTLINE
Basic concepts in machine learning. Instance-based learning.
Supervise and Unsupervised Learning. Bayesian learning
Dimensionality reduction and classification. Statistical learning.
Statistical decision theory. Neural networks.
Regression: Linear regression, Linear Model ensembles.
classification, Logistic regression, Kernel
Learning theory.
density estimation, Classification and
regression trees, Separating hyperplanes Support vector machines.
Decision tree induction. Clustering and dimensionality reduction
Learning sets of rules and logic programs.
DR. HAIDER ALI Machine Learning 2
COURSE INSTRUCTOR
Dr. Haider Ali
Associate Professor, Mechanical Engineering Department,
NED University of Engineering and Technology, Karachi.
Email: haider.ali@neduet.edu.pk; haider.upm@gmail.com
Contact Number:
0336-2343766
Office: 68 (next to MED Conference Room)
3
GRADING SCHEME
Quizzes, Mid Term,Assignments 40%
and Term Project
Final Exam 60%
Machine Learning 4
REVIEW OF PROGRAMMING FUNDAMENTALS
DR. HAIDER ALI Machine Learning 5
PLAN FIRST, THEN CODE
Many novice programmers attempt to dive right into writing the code (in the programming
language) as the first step.
A good programmer will plan first and write second, possibly breaking down a large programming
task into several smaller tasks in the process.
Even when cautioned to plan first and code second, many programming students ignore the
advice—after all, why “waste” 30 minutes planning when you are time-crunched from all the work
you have to do.
DR. HAIDER ALI Machine Learning 6
OVERVIEW OF THE SEVEN STEPS
DR. HAIDER ALI Machine Learning 7
ALGORITHM
An algorithm is a clear set of steps to solve any problem in a particular
class.
A good algorithm can not only be translated into code, but could also
be executed by a person with no particular knowledge of the problem
at hand.
DR. HAIDER ALI Machine Learning 8
ALGORITHM
DR. HAIDER ALI Machine Learning 9
STEP 1: WORK AN EXAMPLE YOURSELF
Work at least one instance of the problem, involving drawing a diagram of the
problem by hand.
If you get stuck at this step, it typically means one of two things.
Problem is ill-specified—it is not clear what you are supposed to do.
Lack of domain knowledge—the knowledge of the particular field or discipline the problem
deals with.
STEP 2: WRITE DOWN WHAT YOU JUST DID
Write down the steps to solve the particular instance, or write down a set of
instructions.
An instruction with kind of complex is fine, as long as it has a clear meaning.
Difficulty: Thinking about the exact steps.
DR. HAIDER ALI Machine Learning 10
STEP 3: GENERALIZE YOUR STEPS
taking particular values
replacing the instance with the variables.
STEP 4:TESTYOUR ALGORITHM
To ensure our steps right by testing with different values other than previous used ones. The only way to
be completely sure that your algorithm is correct is to formally prove its correctness.
Common mistakes:
• Misgeneralizing
• Cases that did not consider
DR. HAIDER ALI Machine Learning 11
DR. HAIDER ALI Machine Learning 12
STEPPING THROUGH AN ALGORITHM
N = 2, you should come up with the sequence of numbers 0 4 12 10.
DR. HAIDER ALI Machine Learning 13
FLOW CHARTS
A schematic representation of a sequence of operations, as in a
manufacturing process or computer program.
A graphical representation of the sequence of operations in a program.
Program flowcharts show the sequence of instructions in a single program or
subroutine. Different symbols are used to draw each type of flowchart.
One of the Online Tool: https://creately.com/diagram-community/examples/t/flowchart
DR. HAIDER ALI Machine Learning 14
FLOWCHART
A Flowchart
shows logic of an algorithm
emphasizes individual steps and their interconnections
e.g. control flow from one action to the next
DR. HAIDER ALI Machine Learning 15
FLOWCHART SYMBOLS Basic
DR. HAIDER ALI Machine Learning 16
CONDITIONAL STATEMENTS
DR. HAIDER ALI Machine Learning 17
FUNCTIONS
A function is a block of organized,
reusable code that is used to perform a
single, related action. Functions
provides better modularity for your
application and a high degree of code
reusing.
DR. HAIDER ALI Machine Learning 18
FUNCTIONS
DR. HAIDER ALI Machine Learning 19
DR. HAIDER ALI Machine Learning 20
DR. HAIDER ALI Machine Learning 21
INTRODUCTION TO MACHINE LEARNING
DR. HAIDER ALI Machine Learning 22
DR. HAIDER ALI Machine Learning 23
MOTIVATION
DR. HAIDER ALI Machine Learning 24
SELF-DRIVING CAR
DR. HAIDER ALI Machine Learning 25
MACHINE LEARNING
1959
1998
DR. HAIDER ALI Machine Learning 26
DR. HAIDER ALI Machine Learning 27
DATA PREPROCESSING IN MACHINE LEARNING
DR. HAIDER ALI Machine Learning 28
DATA PREPROCESSING
Data Preprocessing includes the steps we need to follow to transform or encode data
so that it may be easily parsed by the machine.
DR. HAIDER ALI Machine Learning 29
WHY - DATA PREPROCESSING
A real-world data generally contains noises, missing values, and maybe in an
unusable format which cannot be directly used for machine learning models. Data
preprocessing is required tasks for cleaning the data and making it suitable for a
machine learning model which also increases the accuracy and efficiency of a
machine learning model.
DR. HAIDER ALI Machine Learning 30
DATA PREPROCESSING
Getting the dataset
Importing libraries
Importing datasets
Handling of Missing Data
Encoding Categorical Data
Splitting dataset into training and test set
Feature scaling
DR. HAIDER ALI Machine Learning 31
1) GET THE DATASET
To create a machine learning model, the first thing we required is a dataset as a
machine learning model completely works on data. The collected data for a
particular problem in a proper format is known as the dataset.
Dataset may be of different formats for different purposes, such as, if we want to
create a machine learning model for business purpose, then dataset will be different
with the dataset required for a liver patient. So each dataset is different from
another dataset. To use the dataset in our code, we usually put it into a CSV file.
However, sometimes, we may also need to use an HTML or xlsx file.
DR. HAIDER ALI Machine Learning 32
2) IMPORTING LIBRARIES
In order to perform data preprocessing using Python, we need to import some predefined Python libraries.
There are three specific libraries that we will use for data preprocessing.
Numpy: Numpy Python library is used for including any type of
mathematical operation in the code. It is the fundamental package for
scientific calculation in Python. It also supports to add large,
multidimensional arrays and matrices.
Matplotlib: The second library is matplotlib, which is a Python 2D plotting
library, and with this library, we need to import a sub-library pyplot. This
library is used to plot any type of charts in Python for the code.
Pandas: The last library is the Pandas library, which is one of the most
famous Python libraries and used for importing and managing the datasets.
It is an open-source data manipulation and analysis library.
DR. HAIDER ALI Machine Learning 33
3) IMPORTING DATASETS
Now we need to import the datasets which we have collected for our machine learning
project.
CSV Files
Excel Files
DR. HAIDER ALI Machine Learning 34
4) HANDLING OF MISSING DATA
The Imputer class can take parameters like :
1. missing_values : It is the placeholder for the missing values. All occurrences of missing_values will be imputed. We can give it an
integer or “NaN” for it to find missing values.
2. strategy : It is the imputation strategy — If “mean”, then replace missing values using the mean along the axis (Column). Other
strategies include “median” and “most_frequent”.
3. axis : It can be assigned 0 or 1, 0 to impute along columns and 1 to impute along rows.
DR. HAIDER ALI Machine Learning 35
5) ENCODING CATEGORICAL DATA
Machine learning model completely works
on mathematics and numbers, but if our
dataset would have a categorical variable,
then it may create trouble while building
the model. So it is necessary to encode
these categorical variables into numbers
DR. HAIDER ALI Machine Learning 36
6) SPLITTING DATASET INTO TRAINING AND TEST SET
In machine learning data preprocessing, we divide our dataset into a training set and test set. This is one of the
crucial steps of data preprocessing as by doing this, we can enhance the performance of our machine learning
model.
Training Set: A subset of dataset to train the machine learning model, and we already know the output.
Test set: A subset of dataset to test the machine learning model, and by using the test set, model predicts the
output.
Note: The last parameter random_state is used to set a seed for a random generator so that you always get the same result,
and the most used value for this is 42
DR. HAIDER ALI Machine Learning 37
7) FEATURE SCALING
Feature scaling is the final step of data preprocessing in machine learning. It is a technique to
standardize the independent variables of the dataset in a specific range. In feature scaling, we put our
variables in the same range and in the same scale so that no any variable dominate the other variable.
A machine learning model is based on Euclidean distance, and if we do not scale the variable, then it
will cause some issue in our machine learning model.
DR. HAIDER ALI Machine Learning 38
THANK YOU
DR. HAIDER ALI Machine Learning 39