Chapter 1: Exercise Problems and Solutions
1. Learn multiplication using examples (9×17 = 153, 4×17 = 68) to find 13×17
Problem:
You are given:
• 9 × 17 = 153
• 4 × 17 = 68
You are to deduce:
• 13 × 17 = ?
Solution:
You can observe:
13 = 9 + 4 →
So,
13 × 17 = (9 × 17) + (4 × 17)
= 153 + 68 = 221
Learning Paradigm:
This is supervised learning using inductive reasoning, because:
• You’re using known examples (inputs and outputs) to infer the result for a new input.
• You're generalizing a pattern (addition of multiplications).
2. Deduce from given logic statements
Statements:
• a. Sum of two even numbers is even.
• b. 12 is an even number.
• c. 22 is an even number.
Deduction:
Using (a), we know:
If both numbers are even → their sum is also even.
Now, 12 and 22 are even →
So, 12 + 22 = 34, which is even.
Answer:
34 is an even number.
3. Learning paradigm from logical pattern: If x is even, then x+1 is odd, x+2 is even
Statements:
• If x is an even number → x + 1 is odd → x + 2 is even
• Use this to learn that 37 is odd and 38 is even.
Solution:
From the inverse reasoning:
• 38 is even → So, 37 (38 - 1) must be odd.
Learning Paradigm:
This is deductive learning, since you're using general rules to deduce specific cases.
4. Learn from: If x is odd, then x + 1 is even. 22 is even. Learn that 21 is odd
Statements:
• If x is odd → x + 1 is even
• Given: 22 is even → So, 21 = 22 - 1 must be odd
Learning Paradigm:
This is deductive reasoning again, since we're using a known rule in reverse to deduce a
specific fact.
5. Classify attributes into Nominal / Ordinal / Numeric
Attribute Type Reason
Numbers are identifiers, not quantities to
a. Telephone number Nominal
operate on
b. {ball, bat, wicket, umpire,
Nominal Categorical items with no inherent order
batsman, bowler}
Quantitative value that can be ordered and
c. Temperature Numeric
used in arithmetic
Attribute Type Reason
Ordered categories, but not precise numerical
d. {short, medium height, tall} Ordinal
difference
6. Relationship between Euclidean distance and cosine similarity
Let x and y be unit vectors (‖x‖ = ‖y‖ = 1).
Then:
Euclidean Distance=∥x−y∥2=∥x∥2+∥y∥2−2(x⋅y)\text{Euclidean Distance} = \|x - y\|^2 =
\|x\|^2 + \|y\|^2 - 2(x \cdot y)
Since |x| = |y| = 1:
=1+1−2cos(θ)=2(1−cos(θ))= 1 + 1 - 2 \cos(\theta) = 2(1 - \cos(\theta))
Therefore:
d(x,y)2=2(1−cos(θ))\boxed{d(x, y)^2 = 2(1 - \cos(\theta))}
Where:
• θ\theta is the angle between the vectors
• x⋅y=cos(θ)x \cdot y = \cos(\theta) (since they are unit vectors)
7. Analyze the data set: (1,1,1), (1,1,2), (1,1,3), (1,2,2), (1,1,-1), (6,6,10)
Observations:
• First 5 points are near each other in the (1,1) or (1,2) region → cluster around similar
values
• The last point (6,6,10) is far off → an outlier
Possible Tasks:
• Clustering: You can group the first 5 points together.
• Outlier detection: (6,6,10) can be flagged as an anomaly.
• Distance analysis: Use Euclidean distance for closeness check.
Example:
Distance between (1,1,1) and (1,1,2):
(1−1)2+(1−1)2+(1−2)2=1=1\sqrt{(1-1)^2 + (1-1)^2 + (1-2)^2} = \sqrt{1} = 1
Distance between (1,1,1) and (6,6,10):
(6−1)2+(6−1)2+(10−1)2=25+25+81=131≈11.45\sqrt{(6-1)^2 + (6-1)^2 + (10-1)^2} = \sqrt{25 +
25 + 81} = \sqrt{131} \approx 11.45
So (6,6,10) is much farther → outlier.
Problem 8 – KNN Prediction
Problem: Predict the missing value in pattern (1,1,-) using KNN:
• Use data from Q7
• Case 1: K = 1
• Case 2: K = 5
• Also, analyze the effect of (100, -100)
Solution:
• For K = 1: Find nearest neighbor using Euclidean distance. Nearest to (1,1,?) is likely
(1,1,1) → Predict value = 1
• For K = 5: Use majority voting from 5 nearest neighbors: (1,1,1), (1,1,2), (1,1,3), (1,1,-
1), (1,2,2). Values: 1, 2, 3, -1, 2 → Mode = 2
• (100, -100) is extremely far → May distort results if not normalized or filtered
Explanation: KNN is sensitive to scale and outliers. Increasing K makes the model more
robust but may include noise. Outliers should be preprocessed.
Problem 9 – Feature Scaling
Problem: Given: X1 = (1, 100000), X2 = (2, 100000), X3 = (1, 200000), X4 = (2, 200000)
a. Range scaling:
• For each feature:
o First column min = 1, max = 2 → scaled: (x - 1)/(2 - 1)
o Second column min = 100000, max = 200000 → scaled: (x - 100000)/(100000)
Result:
• X1 → (0, 0)
• X2 → (1, 0)
• X3 → (0, 1)
• X4 → (1, 1)
b. Standard Scaler:
• Standardize each feature: subtract mean, divide by standard deviation
• Mean of first feature = 1.5, std = 0.5
• Mean of second feature = 150000, std = 50000
Normalized Data:
• X1 → ((1-1.5)/0.5, (100000-150000)/50000) = (-1, -1)
• X2 → (1, -1)
• X3 → (-1, 1)
• X4 → (1, 1)
Explanation: Feature scaling is essential for algorithms like KNN or distance-based models to
avoid bias toward large-scale features.
Problem 10 – Feature Selection vs Feature Extraction
Problem: Show that feature selection is a special case of feature extraction.
Explanation:
• Feature Selection: Choose a subset of existing features without modifying them.
• Feature Extraction: Create new features from existing ones (e.g., PCA, LDA)
When feature extraction method just selects a subset of the original features without
transformation, it becomes equivalent to feature selection.
Conclusion: Feature selection can be considered a restricted or simpler form of feature
extraction where the transformation matrix is binary (selects features directly).