501582-3 Neural Networks
Machine Learning and ANNs
Dr. Huda Hakami
Department of Computer Science, Taif University
Machine Learning
• Machine learning investigates the mechanisms
by which knowledge is acquired through
experience.
• A machine learning algorithm is an algorithm
that is able to learn from data.
• What do we mean by learning?
• “A computer program is said to learn from
experience 𝐸 with respect to some class
of tasks 𝑇 and performance measure 𝑃, if
its performance at tasks in 𝑇, as measured
by 𝑃, improves with experience 𝐸.”
Mitchell (1997)
Source: https://datalya.com/blog/machine-learning/machine-
learning-vs-traditional-programming-paradigm
Machine Learning: Tasks
The task 𝑇:
• Learning is our means of attaining the ability to perform the task.
• if we want a robot to be able to walk, then walking is the task.
• Machine learning tasks: how the machine learning system should process an example.
• As example is a collection of features that are represented by a vector 𝒙 ∈ ℝ!
• E.g., the features of an image is the values of the pixels in the image
Machine Learning: Tasks (Cont.)
The task 𝑇:
• Classification:
• Specify which of 𝑘 categories some input belongs to.
• 𝑓: ℝ! ⟶ {1, … , 𝑘}
• E.g., Object recognition (image classification)
• Sentiment classification:
The movie was great +1
The food was cold and tasted bad -1
Machine Learning: Tasks (Cont.)
The task 𝑇:
• Regression:
• Predict a numerical value given some input.
• 𝑓: ℝ! ⟶ ℝ
• E.g., predicting house prices
• Clustering:
• Dividing the population or data points into a number of groups
Machine Learning: Performance
The performance 𝑃:
• Design a quantitative measure of its performance (task-specific)
• Accuracy of the learnt model:
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
• Accuracy is measured using test data
• Data that is separate from the data used for training the machine learning system.
Machine Learning: Experience
The Experience 𝐸 :
• learning algorithms can be understood as being allowed to experience an entire dataset.
• Dataset: collection of many examples (i.e., data points)
• The learning approaches depends on the given datasets.
Source: https://vitalflux.com/dummies-notes-supervised-vs-unsupervised-
learning/
Machine Learning Approaches
Supervised learning (predictive models):
• Learning with a teacher
• Each example in the dataset is associated with a label (i.e., target output)
• Train dataset {𝑥, 𝑦} where 𝑦 ∈ {+1, −1} is provided to learn the function 𝑓.
• Examples:
• Iris dataset: 150 iris plants, features (sepal length, sepal width, petal length and petal width), three
classes (Iris Setosa, Iris Versicolour, Iris Virginica)
• Available online: https://archive.ics.uci.edu/ml/datasets/iris
Machine Learning Approaches
Supervised learning (predictive models):
• The network then processes the inputs and compares its predicted outputs against the desired
outputs.
• If the resulting output differs from the desired output, the generated error signal adjusts the
weights.
• The error minimization process is supervised by a teacher.
Machine Learning Approaches
Unsupervised learning (descriptive models):
• Experience a dataset containing many features, then learn useful properties of the structure of
this dataset.
• Given a dataset {𝒙" , 𝒙# , … , 𝒙$ } without labels
• Extract the hidden structure to group similar data points (Clustering algorithms)
• Unsupervised learning algorithms:
• Self-organising features map (SOM)
• Principle Component Analysis (PCA)
• …
Machine Learning Approaches
• Unsupervised learning (descriptive models):
Words that express similar sentiments are grouped into
the same cluster (Yogatama et al., 2014)
Supervised vs. Unsupervised Learning
Supervised vs. Unsupervised Learning
Machine Learning Techniques
• kNN (k Nearest Neibour) algorithm
• Naïve Bayes classifier (probabilistic classifier)
• Decision trees
• Genetic algorithms
• Neural Networks
• … kNN classifier
Decision trees
Bayes theorem
• MNIST (Mixed National Institute of Standards and
Technology) dataset of handwriting digits.
Performance of • 60,000 training and 10,000 test instances of hand-written
digits.
ANNs • Each image is encoded as 28x28 pixel grayscale images
Taken from Witten et al., 2017 (Data Mining book)
Learning in ANNs
• Learning in ANNs is:
• A process to store the information into the network.
• A definition of how the system adjusts to new knowledge
• Learning rule:
• Algorithms or equations which manage changes in the weights of the connections in a net
work.
• E.g., gradient descent
• ANNs learn by repeating adjustments for their weights of connections (i.e., parameters)
• Using the training examples, the weights are modified to map input to output.
McCulloch and Pitts Model
• McCulloch and Pitts proposed a very simple idea in 1943:
• The neuron computes the weighted sum of the input signals and compares the result with a
threshold value.
• If the net input is less than the threshold, the neuron output is –1.
• But if the net input is greater than or equal to the threshold, the neuron becomes activated
and its output attains a value +1.
• This type of activation function is called a sign function
Perceptron
• Perceptron is a bio-inspired algorithm that tries to mimic a single neuron (Rosenblatt, 1958)
• A neuron is an information-processing unit that is fundamental to the operation of the NN.
• An algorithm for supervised learning of binary classifiers
• Decide whether or not an input, represented by a vector of numbers, belongs to a specific
class.
• Computation:
• Multiply each input (feature) by a weight and check whether this weighted sum (activation) is
greater than a threshold
• If so, then we “fire” the neuron (i.e. a decision is made based on the activation)
A Single Neuron
𝑫
𝒂𝒄𝒕𝒊𝒗𝒂𝒕𝒊𝒐𝒏 𝑠𝑐𝑜𝑟𝑒 = 𝑎 = ? 𝑥𝒊 𝑤𝒊
𝒊&𝟏
If the activation is greater than a predefined
threshold, then the neuron fires.
Bias
• We need to adjust a fixed shift from zero, if the “interesting” region happens to be far from the
origin.
• Bias allows shifting the activation function by adding a constant (i.e. the given bias) to the input.
• Bias in Neural Networks can be thought of as analogous to the role of a constant in a linear
function, whereby the line is effectively transposed by the constant value.
• Lets adjust the previous model by including a bias term 𝑏 as follows:
*
𝑎 = 𝑏 + ? 𝑥) 𝑤)
)&"
• By adding a feature to each data points that is always equal to 1 (𝑥+ = 1), we can include the bias
term 𝑏 into the weight vector (i.e., 𝑤+ = 𝑏)
*
𝑎 = ? 𝑥) 𝑤) = 𝒙, 𝒘
)&+
Perceptron Training
Activation Functions
• The purpose of an activation function is to ensure that the neuron's response is bounded
• that is, the actual response of the neuron is conditioned or damped, as a result of large or small
activating stimuli and thus controllable
Activation Functions (Cont.)
Taken from Witten et al.,
2017 (Data Mining book)