Lecture 3
Lecture 3
Opinion polls
How does the greater
population feel about an issue?
Correct for over-sampling?
• θ* is “true” average opinion
• X1, X2, … are sample calls
A/B testing
How do we figure out which ad
results in more click-through?
• θ* are the “true” average rates
• X1, X2, … are binary “clicks”
Interpret
Observe X1 , X2 , . . . , Xn drawn IID from f (x; ✓) for some “true” ✓ = ✓⇤
Data exploration
What are the degrees of freedom of the
dataset?
• θ* describes the principle directions of
variation
• X1, X2, … are the individual images
Predict
Observe X1 , X2 , . . . , Xn drawn IID from f (x; ✓) for some “true” ✓ = ✓⇤
Content recommendation
Can we predict how much someone will
like a movie based on past ratings?
• θ* describes user’s preferences
• X1, X2, … are (movie, rating) pairs
https://chat.openai.com/chat
Image to text generation “dog talking on cell phone under water, oil painting”
Can AI generate an image from a prompt?
• θ* describes the coupled structure of
images and text
• X1, X2, … are the (image, caption) pairs
found online https://labs.openai.com/
Linear Regression
The regression problem, 1-dimensional
yi 2 R
{(xi , yi )}ni=1
Sale Price
# square feet
Fit a function to our data, 1-d
yi 2 R
best linear fit {(xi , yi )}ni=1
Hypothesis/Model: linear
Sale Price
i.i.d.
Consider yi = xTi wyi+=✏ixi wwhere
+ ✏i
<latexit sha1_base64="LOtnxQgiPKpP1vzfkfTJc26t6A4=">AAACKHicbZDLSgMxFIYz9VbrbdSN4CZYBEEoM1XQjVKwC5e12At0SsmkZ9rQzIUko5ZhfBpXgj6LO+nWt3Bnello64GEn+8/J5ffjTiTyrJGRmZpeWV1Lbue29jc2t4xd/fqMowFhRoNeSiaLpHAWQA1xRSHZiSA+C6Hhju4GfuNBxCShcG9GkbQ9kkvYB6jRGnUMQ+GHYav8JPeH/EpdiCSjGuDdcy8VbAmhReFPRN5NKtKx/x2uiGNfQgU5UTKlm1Fqp0QoRjlkOacWEJE6ID0oKVlQHyQ7WTygxQfa9LFXij0ChSe0N8TCfGlHPqu7vSJ6st5bwz/81qx8i7bCQuiWEFApxd5MccqxOM4cJcJoIoPtSBUMP1WTPtEEKp0aLmc0wXPKSfO+GBKeFJO0ymrTpnrJtU01VnZ88ksinqxYJ8Vinfn+dL1LLUsOkRH6ATZ6AKV0C2qoBqi6Bm9oDf0brwaH8anMZq2ZozZzD76U8bXD4MFpZE=</latexit>
✏i ⇠ N (0, 2
)
# square feet
Fit a function to our data, 1-d
yi 2 R
best linear fit {(xi , yi )}ni=1
Hypothesis/Model: linear
Sale Price
i.i.d.
Consider yi = xTi wyi+=✏ixi wwhere
+ ✏i
<latexit sha1_base64="LOtnxQgiPKpP1vzfkfTJc26t6A4=">AAACKHicbZDLSgMxFIYz9VbrbdSN4CZYBEEoM1XQjVKwC5e12At0SsmkZ9rQzIUko5ZhfBpXgj6LO+nWt3Bnello64GEn+8/J5ffjTiTyrJGRmZpeWV1Lbue29jc2t4xd/fqMowFhRoNeSiaLpHAWQA1xRSHZiSA+C6Hhju4GfuNBxCShcG9GkbQ9kkvYB6jRGnUMQ+GHYav8JPeH/EpdiCSjGuDdcy8VbAmhReFPRN5NKtKx/x2uiGNfQgU5UTKlm1Fqp0QoRjlkOacWEJE6ID0oKVlQHyQ7WTygxQfa9LFXij0ChSe0N8TCfGlHPqu7vSJ6st5bwz/81qx8i7bCQuiWEFApxd5MccqxOM4cJcJoIoPtSBUMP1WTPtEEKp0aLmc0wXPKSfO+GBKeFJO0ymrTpnrJtU01VnZ88ksinqxYJ8Vinfn+dL1LLUsOkRH6ATZ6AKV0C2qoBqi6Bm9oDf0brwaH8anMZq2ZozZzD76U8bXD4MFpZE=</latexit>
✏i ⇠ N (0, 2
)
# square feet
The regression problem, d-dim
Training Data: x i 2 Rd
yi 2 R
{(xi , yi )}ni=1
Sale price
Hypothesis/Model: linear
T i.i.d. i.i.d.
yi = xTi wyi+=✏ixi wwhere
ConsiderConsider + ✏i where
✏i ⇠ N✏(0, 2
i ⇠ )N
==
# bat X
hroo X
ms # square feet ŷŷii == ŵjj hhjj(x
ŵ (xii))
6=00
ŵjj6=
ŵ
The regression problem, d-dim
Training Data: x i 2 Rd
yi 2 R
{(xi , yi )}ni=1
Sale price
Hypothesis/Model: linear
T i.i.d. i.i.d.
yi = xTi wyi+=✏ixi wwhere
ConsiderConsider + ✏i where
✏i ⇠ N✏(0, 2
i ⇠ )N
1
<latexit sha1_base64="x8Gxg0mFUGvHDx7sknp16ilgZos=">AAACTHicbZDPbhMxEMa9AdoS/gU4crEaIaVSG3ZXVcsFqYILx1Zq2kpxEnmd2dSqd9fYs7Qrs8/FW/TeAzcEL8ANVao3iRC0jGTp0/fNjO1fopW0GIZXQeve/Qcrq2sP248eP3n6rPP8xZEtSiNgIApVmJOEW1AyhwFKVHCiDfAsUXCcnH1o8uPPYKws8kOsNIwyPstlKgVHb006B7pXfbnYPN9kVs4yvkHfUZYaLlxUO2Y/GXQx03IRjuO6pjB2W7RXbV2MGRaanm+MY/qGxvRPy6TTDfvhvOhdES1Flyxrf9L5zqaFKDPIUShu7TAKNY4cNyiFgrrNSguaizM+g6GXOc/Ajtz86zV97Z0pTQvjT4507v494XhmbZUlvjPjeGpvZ435v2xYYvp25GSuS4RcLC5KS0WxoA1HOpUGBKrKCy6M9G+l4pR7cuhpt9kUUhY51uxNUg+zwRLdhnBXHMX9aKcfHWx3994vAa2RV2Sd9EhEdske+Uj2yYAI8pV8Iz/Iz+Ay+BX8Dq4Xra1gOfOS/FOtlRviw7Ja</latexit>
(y x> w)2 /2
p(y|x, w, ) = p e
2⇡ 2
# bat
hroo
ms # square feet
The regression problem, d-dim
Training Data: x i 2 Rd
yi 2 R
{(xi , yi )}ni=1
Sale price
Hypothesis/Model: linear
T i.i.d. i.i.d.
yi = xTi wyi+=✏ixi wwhere
ConsiderConsider + ✏i where
✏i ⇠ N✏(0, 2
i ⇠ )N
1
<latexit sha1_base64="x8Gxg0mFUGvHDx7sknp16ilgZos=">AAACTHicbZDPbhMxEMa9AdoS/gU4crEaIaVSG3ZXVcsFqYILx1Zq2kpxEnmd2dSqd9fYs7Qrs8/FW/TeAzcEL8ANVao3iRC0jGTp0/fNjO1fopW0GIZXQeve/Qcrq2sP248eP3n6rPP8xZEtSiNgIApVmJOEW1AyhwFKVHCiDfAsUXCcnH1o8uPPYKws8kOsNIwyPstlKgVHb006B7pXfbnYPN9kVs4yvkHfUZYaLlxUO2Y/GXQx03IRjuO6pjB2W7RXbV2MGRaanm+MY/qGxvRPy6TTDfvhvOhdES1Flyxrf9L5zqaFKDPIUShu7TAKNY4cNyiFgrrNSguaizM+g6GXOc/Ajtz86zV97Z0pTQvjT4507v494XhmbZUlvjPjeGpvZ435v2xYYvp25GSuS4RcLC5KS0WxoA1HOpUGBKrKCy6M9G+l4pR7cuhpt9kUUhY51uxNUg+zwRLdhnBXHMX9aKcfHWx3994vAa2RV2Sd9EhEdske+Uj2yYAI8pV8Iz/Iz+Ay+BX8Dq4Xra1gOfOS/FOtlRviw7Ja</latexit>
(y x> w)2 /2 2
p(y|x, w, ) = p e
2⇡ 2
# bat
hroo
ms # square feet
Maximizing log-likelihood
1 (y x> w)2 /2 2
yi 2 R p(y|x, w, ) = p e
{(xi , yi )}ni=1 2⇡ 2
<latexit sha1_base64="/Bxf7hK+LPJaSBIHDeQGuaijOFM=">AAACinicdZHfbtMwFMad8G+UAQUud2NRJnXSVuIIDRCaNMEuuCwS3SbVTeQ4TmfNsT3bYVRenoHn455X4B6nrRBscCRLn77z+dj6nUILbl2SfI/iW7fv3L23cb/3YPPho8f9J0+PrWoMZROqhDKnBbFMcMkmjjvBTrVhpC4EOynOP3T9ky/MWK7kZ7fQbFaTueQVp8QFK+9/Gw9xTdwZJcIftVeXu9jyeU124AHE2qgy9/wAtZmEerjI+dXXnO/+P4MrQ6hHrcf2wjifYs1X0SxtW8gyvwe7KXthSoad0vByJ0vhS5jC37G8P0hGybLgTYHWYgDWNc77P3CpaFMz6agg1k5Rot3ME+M4Fazt4cYyTeg5mbNpkJLUzM78ElwLt4NTwkqZcKSDS/fPG57U1i7qIiQ7SPZ6rzP/1Zs2rnoz81zqxjFJVw9VjYBOwW4LsOSGUScWQRBqePgrpGck0HNhVz1csgojv1xMUQWgHRZ0HcJNcZyO0P4IfXo1OHy/BrQBtsBzMAQIvAaH4CMYgwmg4Ge0Fb2ItuPNOI3fxu9W0Tha33kG/qr46BeUyMRf</latexit>
n
Y n
Y 1
Likelihood: P (D|w, ) = (yi x> 2 2
p(yi |xi , w, ) = p e i w) /2
2⇡ 2
i=1 i=1
Maximum Likelihood Estimation
Observe X1 , X2 , . . . , Xn drawn IID from f (x; ✓) for some “true” ✓ = ✓⇤
n
Y
Likelihood function Ln (✓) = f (Xi ; ✓)
i=1
n
X
Log-Likelihood function ln (✓) = log(Ln (✓)) = log(f (Xi ; ✓))
i=1
1 (y x> w)2 /2 2
yi 2 R p(y|x, w, ) = p e
{(xi , yi )}ni=1 2⇡ 2
<latexit sha1_base64="/Bxf7hK+LPJaSBIHDeQGuaijOFM=">AAACinicdZHfbtMwFMad8G+UAQUud2NRJnXSVuIIDRCaNMEuuCwS3SbVTeQ4TmfNsT3bYVRenoHn455X4B6nrRBscCRLn77z+dj6nUILbl2SfI/iW7fv3L23cb/3YPPho8f9J0+PrWoMZROqhDKnBbFMcMkmjjvBTrVhpC4EOynOP3T9ky/MWK7kZ7fQbFaTueQVp8QFK+9/Gw9xTdwZJcIftVeXu9jyeU124AHE2qgy9/wAtZmEerjI+dXXnO/+P4MrQ6hHrcf2wjifYs1X0SxtW8gyvwe7KXthSoad0vByJ0vhS5jC37G8P0hGybLgTYHWYgDWNc77P3CpaFMz6agg1k5Rot3ME+M4Fazt4cYyTeg5mbNpkJLUzM78ElwLt4NTwkqZcKSDS/fPG57U1i7qIiQ7SPZ6rzP/1Zs2rnoz81zqxjFJVw9VjYBOwW4LsOSGUScWQRBqePgrpGck0HNhVz1csgojv1xMUQWgHRZ0HcJNcZyO0P4IfXo1OHy/BrQBtsBzMAQIvAaH4CMYgwmg4Ge0Fb2ItuPNOI3fxu9W0Tha33kG/qr46BeUyMRf</latexit>
n
Y n
Y 1
Likelihood: P (D|w, ) = (yi x> 2 2
p(yi |xi , w, ) = p e i w) /2
2⇡ 2
i=1 i=1
<latexit sha1_base64="YNlewJsyKWhRpABadfADwpH03KI=">AAACgXicbZFda9swFIZldx9d9pW2l93FYWGQwJraZrSDUijbLnaZwdIWotjIsuyIypYnye2C5tv9x93vcj9ichLG1u6A4OU97+GI56S14NoEwQ/P37p3/8HD7Ue9x0+ePnve39k917JRlE2pFFJdpkQzwSs2NdwIdlkrRspUsIv06n3Xv7hmSnNZfTbLms1LUlQ855QYZyX971jIAiZDwCUxC0qE/dDCN7h5jTUvSjKCU+gSWLDcuFCtZJZYfhq2cQU4V4TasLVYf1HGRrjm66k4altgsT2A4TLhB18THmMja7gZxREcQgR/YoAVLxZmlPQHwThYFdwV4UYM0KYmSf8nziRtSlYZKojWszCozdwSZTgVrO3hRrOa0CtSsJmTFSmZntsVrxZeOSeDXCr3KgMr9+8JS0qtl2Xqkh0VfbvXmf/rzRqTv51bXtWNYRVdL8obAUZCBx8yrhg1YukEoYq7vwJdEEfRuBP1cMZyHNrVJdLcgW0dlvA2hLviPBqHR+Pw05vB2bsNoG20j16iIQrRMTpDH9EETRFFv7xdb9974W/5Iz/wo3XU9zYze+if8k9+AyPvwDE=</latexit>
n
!
Y 1
Maximize (wrt w): (yi x> 2 2
log P (D|w, ) = log p e i w) /2
2⇡ 2
i=1
Maximizing log-likelihood
1 (y x> w)2 /2 2
yi 2 R p(y|x, w, ) = p e
{(xi , yi )}ni=1 2⇡ 2
<latexit sha1_base64="/Bxf7hK+LPJaSBIHDeQGuaijOFM=">AAACinicdZHfbtMwFMad8G+UAQUud2NRJnXSVuIIDRCaNMEuuCwS3SbVTeQ4TmfNsT3bYVRenoHn455X4B6nrRBscCRLn77z+dj6nUILbl2SfI/iW7fv3L23cb/3YPPho8f9J0+PrWoMZROqhDKnBbFMcMkmjjvBTrVhpC4EOynOP3T9ky/MWK7kZ7fQbFaTueQVp8QFK+9/Gw9xTdwZJcIftVeXu9jyeU124AHE2qgy9/wAtZmEerjI+dXXnO/+P4MrQ6hHrcf2wjifYs1X0SxtW8gyvwe7KXthSoad0vByJ0vhS5jC37G8P0hGybLgTYHWYgDWNc77P3CpaFMz6agg1k5Rot3ME+M4Fazt4cYyTeg5mbNpkJLUzM78ElwLt4NTwkqZcKSDS/fPG57U1i7qIiQ7SPZ6rzP/1Zs2rnoz81zqxjFJVw9VjYBOwW4LsOSGUScWQRBqePgrpGck0HNhVz1csgojv1xMUQWgHRZ0HcJNcZyO0P4IfXo1OHy/BrQBtsBzMAQIvAaH4CMYgwmg4Ge0Fb2ItuPNOI3fxu9W0Tha33kG/qr46BeUyMRf</latexit>
n
Y n
Y 1
Likelihood: P (D|w, ) = (yi x> 2 2
p(yi |xi , w, ) = p e i w) /2
2⇡ 2
i=1 i=1
<latexit sha1_base64="YNlewJsyKWhRpABadfADwpH03KI=">AAACgXicbZFda9swFIZldx9d9pW2l93FYWGQwJraZrSDUijbLnaZwdIWotjIsuyIypYnye2C5tv9x93vcj9ichLG1u6A4OU97+GI56S14NoEwQ/P37p3/8HD7Ue9x0+ePnve39k917JRlE2pFFJdpkQzwSs2NdwIdlkrRspUsIv06n3Xv7hmSnNZfTbLms1LUlQ855QYZyX971jIAiZDwCUxC0qE/dDCN7h5jTUvSjKCU+gSWLDcuFCtZJZYfhq2cQU4V4TasLVYf1HGRrjm66k4altgsT2A4TLhB18THmMja7gZxREcQgR/YoAVLxZmlPQHwThYFdwV4UYM0KYmSf8nziRtSlYZKojWszCozdwSZTgVrO3hRrOa0CtSsJmTFSmZntsVrxZeOSeDXCr3KgMr9+8JS0qtl2Xqkh0VfbvXmf/rzRqTv51bXtWNYRVdL8obAUZCBx8yrhg1YukEoYq7vwJdEEfRuBP1cMZyHNrVJdLcgW0dlvA2hLviPBqHR+Pw05vB2bsNoG20j16iIQrRMTpDH9EETRFFv7xdb9974W/5Iz/wo3XU9zYze+if8k9+AyPvwDE=</latexit>
n
!
Y 1
Maximize (wrt w): (yi x> 2 2
log P (D|w, ) = log p e i w) /2
2⇡ 2
i=1
<latexit sha1_base64="8N26PSiPQSK+F8CKWZp0bHEs0KA=">AAACPXicbVBNbxMxEPW2QEv4CuXIxSJCKgeidVVRLpWqVkgcQCoSaSvFycrrnU2s2t6VPUuIrP0//Rf8A67AHW6IK1ecNAdoedJIT+/NjD0vr7XymKbfkrX1GzdvbWze7ty5e+/+g+7DrRNfNU7CQFa6cme58KCVhQEq1HBWOxAm13Canx8t/NMP4Lyq7Huc1zAyYmJVqaTAKGXdQz5TBUwFhlmbhbdvXrV0n3LhJtwom0WRct+YLKh91o4t3Z5nij6nHzM15ljVdPZsvJN1e2k/XYJeJ2xFemSF46z7nReVbAxYlFp4P2RpjaMgHCqpoe3wxkMt5LmYwDBSKwz4UVje2tKnUSloWblYFulS/XsiCOP93OSx0wic+qveQvyfN2ywfDkKytYNgpWXD5WNpljRRXC0UA4k6nkkQjoV/0rlVDghMcbb4QWUnAW+2JuXgbVtjIVdDeE6Odnpsxd99m63d3C4CmiTPCZPyDZhZI8ckNfkmAyIJBfkM/lCviafkh/Jz+TXZetaspp5RP5B8vsPiu2ufQ==</latexit>
n
X
> 2
bM LE = arg min
w (yi xi w)
w
i=1
Maximizing log-likelihood
<latexit sha1_base64="8N26PSiPQSK+F8CKWZp0bHEs0KA=">AAACPXicbVBNbxMxEPW2QEv4CuXIxSJCKgeidVVRLpWqVkgcQCoSaSvFycrrnU2s2t6VPUuIrP0//Rf8A67AHW6IK1ecNAdoedJIT+/NjD0vr7XymKbfkrX1GzdvbWze7ty5e+/+g+7DrRNfNU7CQFa6cme58KCVhQEq1HBWOxAm13Canx8t/NMP4Lyq7Huc1zAyYmJVqaTAKGXdQz5TBUwFhlmbhbdvXrV0n3LhJtwom0WRct+YLKh91o4t3Z5nij6nHzM15ljVdPZsvJN1e2k/XYJeJ2xFemSF46z7nReVbAxYlFp4P2RpjaMgHCqpoe3wxkMt5LmYwDBSKwz4UVje2tKnUSloWblYFulS/XsiCOP93OSx0wic+qveQvyfN2ywfDkKytYNgpWXD5WNpljRRXC0UA4k6nkkQjoV/0rlVDghMcbb4QWUnAW+2JuXgbVtjIVdDeE6Odnpsxd99m63d3C4CmiTPCZPyDZhZI8ckNfkmAyIJBfkM/lCviafkh/Jz+TXZetaspp5RP5B8vsPiu2ufQ==</latexit>
n
X
bM LE = arg min
w (yi x>
i w) 2
Set derivate=0, solve for w
w
i=1
Maximizing log-likelihood
<latexit sha1_base64="8N26PSiPQSK+F8CKWZp0bHEs0KA=">AAACPXicbVBNbxMxEPW2QEv4CuXIxSJCKgeidVVRLpWqVkgcQCoSaSvFycrrnU2s2t6VPUuIrP0//Rf8A67AHW6IK1ecNAdoedJIT+/NjD0vr7XymKbfkrX1GzdvbWze7ty5e+/+g+7DrRNfNU7CQFa6cme58KCVhQEq1HBWOxAm13Canx8t/NMP4Lyq7Huc1zAyYmJVqaTAKGXdQz5TBUwFhlmbhbdvXrV0n3LhJtwom0WRct+YLKh91o4t3Z5nij6nHzM15ljVdPZsvJN1e2k/XYJeJ2xFemSF46z7nReVbAxYlFp4P2RpjaMgHCqpoe3wxkMt5LmYwDBSKwz4UVje2tKnUSloWblYFulS/XsiCOP93OSx0wic+qveQvyfN2ywfDkKytYNgpWXD5WNpljRRXC0UA4k6nkkQjoV/0rlVDghMcbb4QWUnAW+2JuXgbVtjIVdDeE6Odnpsxd99m63d3C4CmiTPCZPyDZhZI8ckNfkmAyIJBfkM/lCviafkh/Jz+TXZetaspp5RP5B8vsPiu2ufQ==</latexit>
n
X
bM LE = arg min
w (yi x>
i w) 2
Set derivate=0, solve for w
w
i=1
<latexit sha1_base64="1oZ/0REE3QdZEPPWcX24ngl8J5U=">AAACUXicbVDLahsxFJWnj6ROH0677EbUFNJFzSiUJJtAaAl00UICtROw7EGjuWOLaDSDdCepEfNn/Yusui3dtT/QXTSOF23cAxcO55yrx0krrRzG8fdOdO/+g4cbm4+6W4+fPH3W234+cmVtJQxlqUt7ngoHWhkYokIN55UFUaQaztKLD61/dgnWqdJ8wUUFk0LMjMqVFBikpDfiVyqDuUB/1ST+86fjhh5SriHHHcpdXSReHbJmaujXRLUz5VhWlFs1m+ObqX/LmvXYIlFJrx8P4iXoOmEr0icrnCS9nzwrZV2AQamFc2MWVzjxwqKSGpourx1UQl6IGYwDNaIAN/HL/zf0dVAympc2jEG6VP/e8KJwblGkIVkInLu7Xiv+zxvXmB9MvDJVjWDk7UV5rSmWtC2TZsqCRL0IREirwlupnAsrJIbKuzyDnDPP23PT3LOmCbWwuyWsk9HugO0N2Om7/tH7VUGb5CV5RXYII/vkiHwkJ2RIJPlGfpBf5HfnuvMnIlF0G406q50X5B9EWzcNs7Ow</latexit>
n
! 1 n
X X
bM LE =
w xi x>
i x i yi
i=1 i=1
The regression problem in matrix notation
<latexit sha1_base64="8N26PSiPQSK+F8CKWZp0bHEs0KA=">AAACPXicbVBNbxMxEPW2QEv4CuXIxSJCKgeidVVRLpWqVkgcQCoSaSvFycrrnU2s2t6VPUuIrP0//Rf8A67AHW6IK1ecNAdoedJIT+/NjD0vr7XymKbfkrX1GzdvbWze7ty5e+/+g+7DrRNfNU7CQFa6cme58KCVhQEq1HBWOxAm13Canx8t/NMP4Lyq7Huc1zAyYmJVqaTAKGXdQz5TBUwFhlmbhbdvXrV0n3LhJtwom0WRct+YLKh91o4t3Z5nij6nHzM15ljVdPZsvJN1e2k/XYJeJ2xFemSF46z7nReVbAxYlFp4P2RpjaMgHCqpoe3wxkMt5LmYwDBSKwz4UVje2tKnUSloWblYFulS/XsiCOP93OSx0wic+qveQvyfN2ywfDkKytYNgpWXD5WNpljRRXC0UA4k6nkkQjoV/0rlVDghMcbb4QWUnAW+2JuXgbVtjIVdDeE6Odnpsxd99m63d3C4CmiTPCZPyDZhZI8ckNfkmAyIJBfkM/lCviafkh/Jz+TXZetaspp5RP5B8vsPiu2ufQ==</latexit>
n
X
bM LE = arg min
w (yi x>
i w) 2
w
i=1
2 3 2 3
y1 xT1 d : # of features
6 7 6 7 n : # of examples/datapoints
y = 4 ... 5 X = 4 ... 5
yn xTn
The regression problem in matrix notation
<latexit sha1_base64="8N26PSiPQSK+F8CKWZp0bHEs0KA=">AAACPXicbVBNbxMxEPW2QEv4CuXIxSJCKgeidVVRLpWqVkgcQCoSaSvFycrrnU2s2t6VPUuIrP0//Rf8A67AHW6IK1ecNAdoedJIT+/NjD0vr7XymKbfkrX1GzdvbWze7ty5e+/+g+7DrRNfNU7CQFa6cme58KCVhQEq1HBWOxAm13Canx8t/NMP4Lyq7Huc1zAyYmJVqaTAKGXdQz5TBUwFhlmbhbdvXrV0n3LhJtwom0WRct+YLKh91o4t3Z5nij6nHzM15ljVdPZsvJN1e2k/XYJeJ2xFemSF46z7nReVbAxYlFp4P2RpjaMgHCqpoe3wxkMt5LmYwDBSKwz4UVje2tKnUSloWblYFulS/XsiCOP93OSx0wic+qveQvyfN2ywfDkKytYNgpWXD5WNpljRRXC0UA4k6nkkQjoV/0rlVDghMcbb4QWUnAW+2JuXgbVtjIVdDeE6Odnpsxd99m63d3C4CmiTPCZPyDZhZI8ckNfkmAyIJBfkM/lCviafkh/Jz+TXZetaspp5RP5B8vsPiu2ufQ==</latexit>
n
X
bM LE = arg min
w (yi x>
i w) 2
w
i=1
2 3 2 3
y1 xT1 d : # of features
6 7 6 7 n : # of examples/datapoints
y = 4 ... 5 X = 4 ... 5
yn xTn
yi = xTi w + ✏i
<latexit sha1_base64="XGQ0rQSnZMvlLaYfXpXfZFfomkE=">AAACKnicbZDLSgMxFIYz9VbrrepON8EiCEKZqYJulIJduKylrYVOHTLpmTY0cyHJqGUY8GlcCfos7opbX8Kd6WWh1gOBn+8/5yT53YgzqUxzZGQWFpeWV7KrubX1jc2t/PZOU4axoNCgIQ9FyyUSOAugoZji0IoEEN/lcOsOrsb+7T0IycKgroYRdHzSC5jHKFEaOfm9ocPwBX502F0dP+BjbEMkGdcWc/IFs2hOCs8LayYKaFZVJ/9ld0Ma+xAoyomUbcuMVCchQjHKIc3ZsYSI0AHpQVvLgPggO8nkDyk+1KSLvVDoEyg8oT8nEuJLOfRd3ekT1Zd/vTH8z2vHyjvvJCyIYgUBnV7kxRyrEI8DwV0mgCo+1IJQwfRbMe0TQajSseVydhc8u5LY48WU8KSSplNWmzLXTWppqrOy/iYzL5qlonVSLN2cFsqXs9SyaB8doCNkoTNURteoihqIoif0jF7Rm/FivBsj42PamjFmM7voVxmf3weDplc=</latexit>
<latexit sha1_base64="k1LdFc1OMVRZxUkLfAMqfHn8XTs=">AAACNXicbZDLSsNAFIYnXmu9VV26GSwFQShJFXSjFOzCZS32Ak0pk+lJO3QyCTMTpYQ8gU/jStAnceFO3Lp2Z9J0oa0HBj7+/5wzM78TcKa0ab4ZS8srq2vruY385tb2zm5hb7+l/FBSaFKf+7LjEAWcCWhqpjl0AgnEczi0nfF16rfvQSrmizs9CaDnkaFgLqNEJ1K/UIpsx8WTGF9iPMVOjB/wScY2BIpxX8T9QtEsm9PCi2DNoIhmVe8Xvu2BT0MPhKacKNW1zED3IiI1oxzivB0qCAgdkyF0ExTEA9WLpt+JcSlRBtj1ZXKExlP190REPKUmnpN0ekSP1LyXiv953VC7F72IiSDUIGh2kRtyrH2cZoMHTALVfJIAoZIlb8V0RCShOkkwn7cH4Nq1yE4XU8KjWhxnWiPTHCdqxGlW1nwyi9CqlK3TcuX2rFi9mqWWQ4foCB0jC52jKrpBddREFD2iJ/SCXo1n4934MD6z1iVjNnOA/pTx9QMtmqpv</latexit>
y = Xw + ✏
== + = +
X
X =X
ŷŷii == ŵjj hhjj(x
ŵ (xii)) = Xŵ h (x )
ŷi = j j i
6=00
ŵjj6=
ŵ
=
ŷi =ŵX ŵj hj(xi)
jj 6=0
ŷ ==ŵX
i ŵ h (x )
j 6=0 j j i
ŷi =ŵŵX
j 6=
j ŵj
6=0
0 hj(xi)
ŷi =ŵjj 6=0ŵj hj(xi)
The regression problem in matrix notation
<latexit sha1_base64="8N26PSiPQSK+F8CKWZp0bHEs0KA=">AAACPXicbVBNbxMxEPW2QEv4CuXIxSJCKgeidVVRLpWqVkgcQCoSaSvFycrrnU2s2t6VPUuIrP0//Rf8A67AHW6IK1ecNAdoedJIT+/NjD0vr7XymKbfkrX1GzdvbWze7ty5e+/+g+7DrRNfNU7CQFa6cme58KCVhQEq1HBWOxAm13Canx8t/NMP4Lyq7Huc1zAyYmJVqaTAKGXdQz5TBUwFhlmbhbdvXrV0n3LhJtwom0WRct+YLKh91o4t3Z5nij6nHzM15ljVdPZsvJN1e2k/XYJeJ2xFemSF46z7nReVbAxYlFp4P2RpjaMgHCqpoe3wxkMt5LmYwDBSKwz4UVje2tKnUSloWblYFulS/XsiCOP93OSx0wic+qveQvyfN2ywfDkKytYNgpWXD5WNpljRRXC0UA4k6nkkQjoV/0rlVDghMcbb4QWUnAW+2JuXgbVtjIVdDeE6Odnpsxd99m63d3C4CmiTPCZPyDZhZI8ckNfkmAyIJBfkM/lCviafkh/Jz+TXZetaspp5RP5B8vsPiu2ufQ==</latexit>
n
X
bM LE = arg min
w (yi x>
i w) 2
w
i=1
2 3 2 3
y1 xT1 d : # of features
6 7 6 7 n : # of examples/datapoints
y = 4 ... 5 X = 4 ... 5
yn xTn
yi = xTi w + ✏i
<latexit sha1_base64="XGQ0rQSnZMvlLaYfXpXfZFfomkE=">AAACKnicbZDLSgMxFIYz9VbrrepON8EiCEKZqYJulIJduKylrYVOHTLpmTY0cyHJqGUY8GlcCfos7opbX8Kd6WWh1gOBn+8/5yT53YgzqUxzZGQWFpeWV7KrubX1jc2t/PZOU4axoNCgIQ9FyyUSOAugoZji0IoEEN/lcOsOrsb+7T0IycKgroYRdHzSC5jHKFEaOfm9ocPwBX502F0dP+BjbEMkGdcWc/IFs2hOCs8LayYKaFZVJ/9ld0Ma+xAoyomUbcuMVCchQjHKIc3ZsYSI0AHpQVvLgPggO8nkDyk+1KSLvVDoEyg8oT8nEuJLOfRd3ekT1Zd/vTH8z2vHyjvvJCyIYgUBnV7kxRyrEI8DwV0mgCo+1IJQwfRbMe0TQajSseVydhc8u5LY48WU8KSSplNWmzLXTWppqrOy/iYzL5qlonVSLN2cFsqXs9SyaB8doCNkoTNURteoihqIoif0jF7Rm/FivBsj42PamjFmM7voVxmf3weDplc=</latexit>
<latexit sha1_base64="k1LdFc1OMVRZxUkLfAMqfHn8XTs=">AAACNXicbZDLSsNAFIYnXmu9VV26GSwFQShJFXSjFOzCZS32Ak0pk+lJO3QyCTMTpYQ8gU/jStAnceFO3Lp2Z9J0oa0HBj7+/5wzM78TcKa0ab4ZS8srq2vruY385tb2zm5hb7+l/FBSaFKf+7LjEAWcCWhqpjl0AgnEczi0nfF16rfvQSrmizs9CaDnkaFgLqNEJ1K/UIpsx8WTGF9iPMVOjB/wScY2BIpxX8T9QtEsm9PCi2DNoIhmVe8Xvu2BT0MPhKacKNW1zED3IiI1oxzivB0qCAgdkyF0ExTEA9WLpt+JcSlRBtj1ZXKExlP190REPKUmnpN0ekSP1LyXiv953VC7F72IiSDUIGh2kRtyrH2cZoMHTALVfJIAoZIlb8V0RCShOkkwn7cH4Nq1yE4XU8KjWhxnWiPTHCdqxGlW1nwyi9CqlK3TcuX2rFi9mqWWQ4foCB0jC52jKrpBddREFD2iJ/SCXo1n4934MD6z1iVjNnOA/pTx9QMtmqpv</latexit>
y = Xw + ✏
n
ℓ2 norm: ∥z∥2 = ∑i=1 zi2 = z ⊤z
The regression problem in matrix notation
<latexit sha1_base64="8N26PSiPQSK+F8CKWZp0bHEs0KA=">AAACPXicbVBNbxMxEPW2QEv4CuXIxSJCKgeidVVRLpWqVkgcQCoSaSvFycrrnU2s2t6VPUuIrP0//Rf8A67AHW6IK1ecNAdoedJIT+/NjD0vr7XymKbfkrX1GzdvbWze7ty5e+/+g+7DrRNfNU7CQFa6cme58KCVhQEq1HBWOxAm13Canx8t/NMP4Lyq7Huc1zAyYmJVqaTAKGXdQz5TBUwFhlmbhbdvXrV0n3LhJtwom0WRct+YLKh91o4t3Z5nij6nHzM15ljVdPZsvJN1e2k/XYJeJ2xFemSF46z7nReVbAxYlFp4P2RpjaMgHCqpoe3wxkMt5LmYwDBSKwz4UVje2tKnUSloWblYFulS/XsiCOP93OSx0wic+qveQvyfN2ywfDkKytYNgpWXD5WNpljRRXC0UA4k6nkkQjoV/0rlVDghMcbb4QWUnAW+2JuXgbVtjIVdDeE6Odnpsxd99m63d3C4CmiTPCZPyDZhZI8ckNfkmAyIJBfkM/lCviafkh/Jz+TXZetaspp5RP5B8vsPiu2ufQ==</latexit>
n
X
bM LE = arg min
w (yi x>
i w) 2
w
i=1
2 3 2 3
y1 xT1 d : # of features
6 7 6 7 n : # of examples/datapoints
y = 4 ... 5 X = 4 ... 5
yn xTn
yi = xTi w + ✏i
<latexit sha1_base64="XGQ0rQSnZMvlLaYfXpXfZFfomkE=">AAACKnicbZDLSgMxFIYz9VbrrepON8EiCEKZqYJulIJduKylrYVOHTLpmTY0cyHJqGUY8GlcCfos7opbX8Kd6WWh1gOBn+8/5yT53YgzqUxzZGQWFpeWV7KrubX1jc2t/PZOU4axoNCgIQ9FyyUSOAugoZji0IoEEN/lcOsOrsb+7T0IycKgroYRdHzSC5jHKFEaOfm9ocPwBX502F0dP+BjbEMkGdcWc/IFs2hOCs8LayYKaFZVJ/9ld0Ma+xAoyomUbcuMVCchQjHKIc3ZsYSI0AHpQVvLgPggO8nkDyk+1KSLvVDoEyg8oT8nEuJLOfRd3ekT1Zd/vTH8z2vHyjvvJCyIYgUBnV7kxRyrEI8DwV0mgCo+1IJQwfRbMe0TQajSseVydhc8u5LY48WU8KSSplNWmzLXTWppqrOy/iYzL5qlonVSLN2cFsqXs9SyaB8doCNkoTNURteoihqIoif0jF7Rm/FivBsj42PamjFmM7voVxmf3weDplc=</latexit>
<latexit sha1_base64="k1LdFc1OMVRZxUkLfAMqfHn8XTs=">AAACNXicbZDLSsNAFIYnXmu9VV26GSwFQShJFXSjFOzCZS32Ak0pk+lJO3QyCTMTpYQ8gU/jStAnceFO3Lp2Z9J0oa0HBj7+/5wzM78TcKa0ab4ZS8srq2vruY385tb2zm5hb7+l/FBSaFKf+7LjEAWcCWhqpjl0AgnEczi0nfF16rfvQSrmizs9CaDnkaFgLqNEJ1K/UIpsx8WTGF9iPMVOjB/wScY2BIpxX8T9QtEsm9PCi2DNoIhmVe8Xvu2BT0MPhKacKNW1zED3IiI1oxzivB0qCAgdkyF0ExTEA9WLpt+JcSlRBtj1ZXKExlP190REPKUmnpN0ekSP1LyXiv953VC7F72IiSDUIGh2kRtyrH2cZoMHTALVfJIAoZIlb8V0RCShOkkwn7cH4Nq1yE4XU8KjWhxnWiPTHCdqxGlW1nwyi9CqlK3TcuX2rFi9mqWWQ4foCB0jC52jKrpBddREFD2iJ/SCXo1n4934MD6z1iVjNnOA/pTx9QMtmqpv</latexit>
y = Xw + ✏
n
X
bM LE = arg min
w (yi x>
i w) 2
w
i=1
2 3 2 3
y1 xT1 d : # of features
6 7 6 7 n : # of examples/datapoints
y = 4 ... 5 X = 4 ... 5
yn xTn
yi = xTi w + ✏i
<latexit sha1_base64="XGQ0rQSnZMvlLaYfXpXfZFfomkE=">AAACKnicbZDLSgMxFIYz9VbrrepON8EiCEKZqYJulIJduKylrYVOHTLpmTY0cyHJqGUY8GlcCfos7opbX8Kd6WWh1gOBn+8/5yT53YgzqUxzZGQWFpeWV7KrubX1jc2t/PZOU4axoNCgIQ9FyyUSOAugoZji0IoEEN/lcOsOrsb+7T0IycKgroYRdHzSC5jHKFEaOfm9ocPwBX502F0dP+BjbEMkGdcWc/IFs2hOCs8LayYKaFZVJ/9ld0Ma+xAoyomUbcuMVCchQjHKIc3ZsYSI0AHpQVvLgPggO8nkDyk+1KSLvVDoEyg8oT8nEuJLOfRd3ekT1Zd/vTH8z2vHyjvvJCyIYgUBnV7kxRyrEI8DwV0mgCo+1IJQwfRbMe0TQajSseVydhc8u5LY48WU8KSSplNWmzLXTWppqrOy/iYzL5qlonVSLN2cFsqXs9SyaB8doCNkoTNURteoihqIoif0jF7Rm/FivBsj42PamjFmM7voVxmf3weDplc=</latexit>
<latexit sha1_base64="k1LdFc1OMVRZxUkLfAMqfHn8XTs=">AAACNXicbZDLSsNAFIYnXmu9VV26GSwFQShJFXSjFOzCZS32Ak0pk+lJO3QyCTMTpYQ8gU/jStAnceFO3Lp2Z9J0oa0HBj7+/5wzM78TcKa0ab4ZS8srq2vruY385tb2zm5hb7+l/FBSaFKf+7LjEAWcCWhqpjl0AgnEczi0nfF16rfvQSrmizs9CaDnkaFgLqNEJ1K/UIpsx8WTGF9iPMVOjB/wScY2BIpxX8T9QtEsm9PCi2DNoIhmVe8Xvu2BT0MPhKacKNW1zED3IiI1oxzivB0qCAgdkyF0ExTEA9WLpt+JcSlRBtj1ZXKExlP190REPKUmnpN0ekSP1LyXiv953VC7F72IiSDUIGh2kRtyrH2cZoMHTALVfJIAoZIlb8V0RCShOkkwn7cH4Nq1yE4XU8KjWhxnWiPTHCdqxGlW1nwyi9CqlK3TcuX2rFi9mqWWQ4foCB0jC52jKrpBddREFD2iJ/SCXo1n4934MD6z1iVjNnOA/pTx9QMtmqpv</latexit>
y = Xw + ✏
bM LE = (XT X)
bLS = w
w 1
XT Y
The regression problem in matrix notation
bLS + bbLS XT 1 = XT y
XT Xw
bLS + bbLS 1T 1 = 1T y
1T Xw
bLS = (XT X)
w 1
XT Y
n
bbLS 1X
= yi
n i=1