0% found this document useful (0 votes)

138 views9 pages

Roch Mmids Intro 5exercises

Uploaded by

aslamzohaib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

138 views9 pages

Roch Mmids Intro 5exercises

Uploaded by

aslamzohaib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Course: Math 535 - Mathematical Methods in Data Science (MMiDS)

Author: Sebastien Roch, Department of Mathematics, University of Wisconsin-Madison

Updated: February 14, 2022
Copyright: © 2022 Sebastien Roch

Chapter 1 - Introduction and review

1.5 Exercises
1.1 (Adapted from [VLMS]) Recall that for two vectors and of the same dimension, their
x y

inner product is . A vector is said to be nonnegative if all of its entries are .

x
T
y ≥ 0

(a) Show that the inner product of two nonnegative vectors is necessarily . ≥ 0

(b) Suppose the inner product of two nonnegative vectors is zero. What can you say about their
sparsity patterns, i.e., which of their entries are zero or nonzero.
⊲

1.2 (Adapted from [VLMS]) A Boolean -vector is one for which all entries are either or .
n 0 1

Such vectors are used to encode whether each of conditions holds, with
n meaning that ai = 1

condition holds.
i

(a) Another common encoding of the same information uses the two values and for the −1 +1

entries. For example the Boolean vector would be written using this alternative
(0, 1, 1, 0)
T

encoding as (−1, +1, +1, −1) . Suppose that is a Boolean vector, and is a vector
T
x y

encoding the same information using the values and . Express in terms of using
−1 +1 y x

vector notation. Also, express in terms of using vector notation.

x y

(b) Suppose that and are Boolean -vectors. Give a simple word description of their
x y n

squared Euclidean distance and justify it mathematically.

∥x − y∥
2
2

1.3 (Adapted from [VLMS]) If is a vector, then is the vector of size

a ar:s , with entries s − r + 1

, i.e.,
ar , … , as . The vector is called a slice. As a more concrete
ar:s = (ar , … , as )
T
ar:s

example, if is the -vector

z 4 , the slice
(1, −1, 2, 0)
T
. Suppose the -vector
z2:3 = (−1, 2)
T
T x

represents a time series or signal. The quantity

2 2 2
D(x) = (x1 − x2 ) + (x2 − x3 ) + ⋯ + (xT −1 − xT ) ,

the sum of the differences of adjacent values of the signal, is called the Dirichlet energy of the
signal. The Dirichlet energy is a measure of the roughness or wiggliness of the time series.
(a) Express D(x)in vector notation using slicing. [Hint: Note the similarity between and D(x)

the squared Euclidean distance between two vectors.]

(b) How small can be? What signals have this minimum value of the Dirichlet energy?
D(x) x

value of . Give the value of the Dirichlet energy achieved.

D(x)

1.4 (Adapted from [VLMS]) A vector of length can represent the number of times each word in
n

a dictionary of words appears in a document. For example,

n means that the first (25, 2, 0)
T

dictionary word appears times, the second one twice, and the third one not at all. Suppose
25

the -vector is the word count vector associated with a document and a dictionary of
n w n

words. For simplicity we will assume that all words in the document appear in the dictionary.
(a) What is ? Here is an all-one vector of the appropriate size.
1
T
w 1

(b) What does mean?

w282 = 0

the words in the document that are word . Use vector notation to express in terms of . (You
i h w

can assume that the document contains at least one word.)

⊲

1.5 (Adapted from [VLMS]) Suppose and are vectors of the same size. The triangle
a b

inequality states that . Show that we also have

∥a + b∥2 ≤ ∥a∥2 + ∥b∥2

∥a + b∥2 ≥ ∥a∥2 − ∥b∥2 . [Hint: Apply the triangle inequality to .] (a + b) + (−b) ⊲

1.6 (Adapted from [VLMS]) Verify that the following identities hold for any two vectors and a b

of the same size.

(a) (a + b)
T
(a − b) = ∥a∥ . 2

2
− ∥b∥
2

(b) ∥a + b∥
2
2
+ ∥a − b∥
2
2
. This is called the Parallelogram Law.
= 2(∥a∥)
2
2
+ ∥b∥ )
2
2

1.7 (Adapted from [VLMS]) The root-mean-square (RMS) value of an -vector is defined as n x

2 2
x + ⋯ + xn ∥x∥2
1
rms(x) = √ = .
n √n

(a) Show that at least one entry of a vector has absolute value at least as large as the RMS value
of the vector.
(b) For an -vector , let
n x and
avg(x) = 1
T
x/n

∥x − avg(x)x∥2
std(x) = .
√n

Establish the identity

2 2 2
rms(x) = avg(x) + std(x) .

[Hint: Expand the expression for std(x)

2
.]
(c) Use (b) to show that |avg(x)| ≤ rms(x) .
(d) Use (b) to show that std(x) ≤ rms(x) .
⊲

1.8 (Adapted from [VLMS]) Suppose that the vectors are clustered using k-means,
x1 , … , xN

with group representatives .

z1 , … , zk

(a) Suppose the original vectors are nonnegative, i.e., their entries are nonnegative. Explain
xi

why the representatives are also nonnegative.

(b) Suppose the original vectors represent proportions, i.e., their entries are nonnegative and
xi

sum to one. Explain why the representatives also represent proportions, i.e., their entries are
zj

nonnegative and sum to one.

interpretation of , the -th entry of the -th group representative.

(zj )i i j

1.9 (Adapted from [VLMS]) Clustering a collection of vectors into groups is called -way k = 2 2

partitioning, since we are partitioning the vectors into groups, with index sets and . 2 G1 G2

Suppose we run -means with

k on the -vectors
k = 2 . Show that there is a
n x1 , … , xN

nonzero vector and a scalar that satisfy

w v

T T
w xi + v ≥ 0, ∀i ∈ G1 , w xi + v ≤ 0, ∀i ∈ G2 .

In other words, the affine function f (x) = w is greater than or equal to zero on the first
T
x + v

group, and less than or equal to zero on the second group. This is called linear separation of the
two groups. [Hint: Consider the function ∥x − z1 ∥ , where and are the
2
− ∥x − z2 ∥
2
z1 z2

group representatives.]
2 2

1.10 (Adapted from [ASV]) Let . A random variable has the geometric distribution
0 < p ≤ 1 X

with success parameter if the possible values of are

p and satisfies
X {1, 2, 3, …} X

P(X = k) = (1 − p)
k−1
for positive integers . Its mean is and its variance is
p k 1/p (1 − p)/p
2

.
Let be a geometric random variable with parameter
X . p = 1/6

(a) Use Markov’s inequality to find an upper bound for . P(X ≥ 16)

(b) Use Chebyshev’s inequality to find an upper bound for . P(X ≥ 16)

compare with the upper bounds you derived.

⊲

1.11 (Adapted from [ASV]) Let . A random variable has the exponential
0 < λ < +∞ X

distribution with parameter if has density function

λ X , for
f (x) = λeand−λx
x ≥ 0 0

otherwise. Its mean is and its variance is .

1/λ 1/λ
2

Let be an exponential random variable with parameter

X . λ = 1/2

(a) Use Markov’s inequality to find an upper bound for . P(X > 6)

(b) Use Chebyshev’s inequality to find an upper bound for . P(X > 6)

bounds you derived.

⊲

1.12 (Adapted from [ASV]) Suppose that is a nonnegative random variable with
X . E[X] = 10

(a) Use Markov to give an upper bound on the probability that is larger than .
X 15

(b) Suppose that we also know that Var(X) = 3 . Use Chebyshev to give a better upper bound
on
P(X > 15) than in part (a).
(c) Suppose that Y1 , Y2 , … , Y300 are i.i.d. random variables with the same distribution as , so X

that, in particularE(Yi ) = 10 and . Use Chebyshev to give an upper bound on the

Var(Yi ) = 3

probability that ∑
300

i=1
is larger than .
Yi 3060

1.13 (Adapted from [ASV]) Suppose that we have i.i.d. random variables X1 , X2 , X3 , … with
finite mean and variance
E[X1 ] = μ . Let
Var(X1 ) = σ
2
S n = X1 + ⋯ + Xn . Prove that for
any fixed ε > 0 and we have
1/2 < α < 1

∣ Sn − nμ ∣
lim P [∣ ∣ < ε] = 1.
α
n→+∞ ∣ n ∣

1.14 (Adapted from [ASV]) By mimicking the proof of Weak Law of Large Numbers, prove the
following variant. Suppose that we have random variables each with finite mean
X1 , X2 , …

E[Xi ] = μ and variance . Suppose further that

Var(Xi ) = σ
2
whenever
Cov(Xi , Xj ) = 0

|i − j| ≥ 2 and that there is a constant so that

c > 0 for all . Let
|Cov(Xi , Xi+1 )| < c i

S n = X1 + ⋯ + Xn . Then for any fixed we have

ε > 0

∣ Sn ∣
lim P [∣ − μ∣] < ε = 1.
n→+∞ ∣ n ∣

1.15 (Adapted from [ASV]) By mimicking the proof of Chebyshev, prove the following variant. Let
X be a random variable with a finite mean and for which
μ for some
E[exp(s(X − μ)] < +∞
s > 0 . Then we have for c > 0

E[exp(s(X − μ)]
P(X ≥ μ + c) ≤ .
sc
e

Justify your answer carefully. ⊲

1.16 (Adapted from [ASV]) Recall that the cumulative distribution function (CDF) of a real-
valued random variable is the function
Z for all
FZ (z) = P[Z ≤ z] . Let z ∈ R X1 , X2 , … , Xn

be independent random variables with the same cumulative distribution function . Denote the F

minimum and the maximum by

Z = min(X1 , X2 , … , Xn ) and W = max(X1 , X2 , … , Xn ).

Find the cumulative distribution functions and of and respectively.

FZ FW Z W ⊲

1.17 (Adapted from [ASV]) Suppose that are i.i.d. random variables with Exponential
X1 , X2 , …

distribution with parameter (see Exercise 1.11 above), and let

λ = 1 .Mn = max(X1 , … , Xn )

Show that for any we have

x ∈ R

−x
lim P(Mn − ln n ≤ x) = exp(−e ).
n→+∞

The right hand side is the cumulative distribution function of the Gumbel distribution, an
example of an extreme value distribution. [Hint: Use Exercise 1.16 to compute
P(Mn ≤ ln n + x) explicitly, and then evaluate the limit as .] n → +∞ ⊲

1.18 (Adapted from [ASV]) Let and be two disjoint events. Under what condition are they
A B

independent? ⊲

1.19 (Adapted from [ASV]) We choose a number from the set uniformly at
{10, 11, 12, … , 99}

random.
(a) Let be the first digit and the second digit of the chosen number. Show that and
X Y X Y

are independent random variables.

(b) Let be the first digit of the chosen number and the sum of the two digits. Show that
X Z X

and are not independent.

1.20 (Adapted from [ASV]) Suppose that and have joint probability density function
X Y

f (x, y) = 2e for
−(x+2y)
and elsewhere. Show that and are independent
x > 0, y > 0 0 X Y

random variables and provide their marginal distributions. [Hint: Recall the exponential
distribution from Exercise 1.11.] ⊲

1.21 (Adapted from [ASV]) Suppose that have joint probability density function
X, Y

2 2
x (x − y)
f (x, y) = c exp[− − ],
2 2

for x, y ∈ R
2
, for some constant .
c > 0
(a) Find the value of the constant . [Hint: The order of integration matters. You can do this
c

without doing complicated integrals. Specifically, recall that for any and , it holds μ ∈ R σ > 0

that
+∞ 2
1 (z − μ)
∫ exp(− )dz = 1. (1)
2
−∞ √2πσ 2 2σ

]
(b) Find the marginal density functions of and . [Hint: You can do this without doing
X Y

complicated integrals. Complete the square (see Section 1.3.1.1).]

1.22 (Adapted from [ASV]) Let be the joint probability mass function of
p(x, y) . Assume (X, Y )

that there are two functions and such that

a b for all possible values of
p(x, y) = a(x)b(y) x X

and of . Show that and are independent. Do not assume that and are probability
y Y X Y a b

mass functions. ⊲

1.23 (Adapted from [BHK]) (a) Let and be independent random variables with uniform
X Y

distribution in . Compute ,
[−1/2, 1/2] , , 2
, , and
E[X] E[X ] Var[X ] E[X − Y ] E[XY ]
2

E[(X − Y ) ] . 2

(b) What is the expected squared Euclidean distance between two points generated at random
inside the unit d-dimensional cube ?
C = [−1/2, 1/2]
d

1.24 (Adapted from [BHK]) (a) For an arbitrary a > 0 give a probability distribution for a
nonnegative random variable such thatX

E[X]
P[X ≥ a] = .
a

(b) Show that for any c > 0 there exists a distribution for which Chebyshev’s inequality is tight,
that is,
Var[X]
P[|X − E[X]| ≥ c] = .
2
c

[Hint: Choose a distribution symmetric about .] 0

1.25 (Adapted from [BHK]) Let X1 , X2 , … , Xn be i.i.d. random variables with

mean and μ

variance . Let
σ
2

n
¯
¯¯¯¯ 1
Xn = ∑ Xi ,
n
i=1
be the sample mean. Suppose one estimates the variance using the sample mean as follows
n
2
1 ¯
¯¯¯¯
2
Sn = ∑ (Xi − X n ) .
n
i=1

Calculate . [Hint: Replace

2
E(Sn ) with
Xi − X n
¯
¯¯¯¯
.] (Xi − μ) − (X n − μ)
¯
¯¯¯¯
⊲

1.26 Compute the inner product of and

u = (1, 2, 3, 4) without using the
v = (5, 4, 3, 2)

function np.dot . Hint: The product of two real numbers and is a * b . a b

In [ ]:
import pandas as pd

u = np.array([1., 2., 3. ,4.])

# EDIT THIS LINE: define v

# EDIT THIS LINE: compute the inner product between u and v

1.27 Let and have derivatives at and let and be constants. Prove from the definition of
f g x α β

the derivative that

′ ′ ′
[αf (x) + βg(x)] = αf (x) + βg (x).

1.28 Recall that the cumulative distribution function (CDF) of a random variable is defined as X

FX (z) = P[X ≤ z], ∀z ∈ R.

(a) Let be the interval where

Z FX (z) ∈ (0, 1) and assume that FX is strictly increasing on .
Z

Let . Show that

U ∼ U[0, 1]

−1
P[F (U ) ≤ z] = FX (z).
X

(b) Generate a sample from for arbitrary , using numpy.random and the observation
U[a, b] a b

in (a). This is called the inverse transform sampling method.

In [ ]:
import numpy as np

from numpy.random import default_rng

rng = default_rng(535)a = -1

b = 1

X = rng.random()

# EDIT THIS LINE: transform X to obtain a random variable Y ~ U[a,b]

1.29 Let Z1 ∼ N(μ1 , σ ) and2

be independent Normal variables with mean ,
Z2 ∼ N(μ2 , σ )
2
μ1

and variance and respectively. Recall that is still Normal.

1 2
2 2
μ2 σ σ Z1 + Z2
1 2

(a) What are the mean and variance of ? Z1 + Z2

(b) Compute E[∥X1 − X2 ∥ ] and 2

explicitly in the claim of Section 1.4.1.3.
E[∥Y1 − Y2 ∥ ]
2

⊲
1.30 Suppose we are given vectorsn x1 , … , xn in and a partition
R
d
C1 , … , Ck ⊆ [n] . Let
be the size of cluster and let
ni = |Ci | Ci

1
μ∗ = ∑ xj
i
ni
j∈Ci

be the centroid of , for

Ci i = 1, … , k .
(a) Show that
⎛ ⎞
2 2 2
∑ ∥xj − μ∗ ∥ = ∑ ∥xj ∥ − ni ∥μ∗ ∥ .
i i
⎝ ⎠
j∈Ci j∈Ci

(b) Show that

⎛ ⎞
1
∗ 2 ⎜
2 T ⎟

∥μ ∥ = ⎜
∑ ∥xj ∥ + ∑ x xℓ ⎟
.
i
2 ⎜ j ⎟
n
i j∈Ci j,ℓ∈Ci
⎝ ⎠
j≠ℓ

(c) Show that

2 2 T
∑ ∥xj − xℓ ∥ = 2(ni − 1) ∑ ∥xj ∥ − 2 ∑ x xℓ .
j

j,ℓ∈Ci j∈Ci j,ℓ∈Ci

j≠ℓ j≠ℓ

(d) Combine (a), (b), (c) to prove that minimizing the -means objective k G(C1 , … , Ck ) over all
partitions of is equivalent to minimizing instead
C1 , … , Ck [n]

k
1
2
∑ ∑ ∥xj − xℓ ∥ .
2|Ci |
i=1 j,ℓ∈Ci

1.31 Modify mmids_kmeans to take a tolerance tol as input and stop when the improvement
in objective value G falls below the tolerance. ⊲

1.32 Suppose the rows of A ∈ R are given by the transposes of

n×m
and the r1 , … , rn ∈ R
m

columns of are given by

A . Give expressions for the elements of and
c1 , … , cm ∈ R
n
A
T
A

AA
T
in terms of these vectors. ⊲

References for Chapter 1

Chapter 1 is based on the following references.
[Sol] Solomon, Numerical Algorithms: Methods for Computer Vision, Machine Learning, and
Graphics, CRC, 2015
[Bis] Bishop, Pattern Recognition and Machine Learning, Springer, 2006
[Carp] B. Carpenter, Typical Sets and the Curse of Dimensionality
[BHK] A. Blum, J. Hopcroft, R. Kannan, Foundations of Data Science, Cambridge University
Press, 2020
[VMLS] S. Boyd and L. Vandenberghe. Introduction to Applied Linear Algebra: Vectors, Matrices,
and Least Squares. Cambridge University Press, 2018
[SSBD] S. Shalev-Shwartz and S. Ben-David. Understanding Machine Learning: From Theory to
Algorithms. Understanding Machine Learning: From Theory to Algorithms. Cambridge University
Press, 2014
[ASV] D. Anderson, T. Seppalainen and B. Valko. Introduction to Probability (Cambridge
Mathematical Textbooks). Cambridge University Press, 2017

Additional Exercises For Vectors, Matrices, and Least Squares
No ratings yet
Additional Exercises For Vectors, Matrices, and Least Squares
41 pages
CH 2
No ratings yet
CH 2
121 pages
Manual For Instructors: TO Linear Algebra Fifth Edition
No ratings yet
Manual For Instructors: TO Linear Algebra Fifth Edition
12 pages
Poly Macs201macs203 250726 161340
No ratings yet
Poly Macs201macs203 250726 161340
197 pages
Math Review For ML
No ratings yet
Math Review For ML
41 pages
MA225 L3 Notes
No ratings yet
MA225 L3 Notes
40 pages
Probability & Statistics Primer
No ratings yet
Probability & Statistics Primer
19 pages
Test Code MS (Short Answer Type) 2011 Syllabus For Mathematics
No ratings yet
Test Code MS (Short Answer Type) 2011 Syllabus For Mathematics
5 pages
Vmls - 103exercises
No ratings yet
Vmls - 103exercises
50 pages
Probability Distributions & Theorems
No ratings yet
Probability Distributions & Theorems
18 pages
Probability & Random Variables Guide
No ratings yet
Probability & Random Variables Guide
3 pages
ES Key
No ratings yet
ES Key
6 pages
Book
No ratings yet
Book
113 pages
Book
No ratings yet
Book
113 pages
Gaussian Random Vector Solutions
No ratings yet
Gaussian Random Vector Solutions
12 pages
Machine Learning Homework Guide
No ratings yet
Machine Learning Homework Guide
2 pages
HW 1
No ratings yet
HW 1
7 pages
Probability Solutions for Researchers
No ratings yet
Probability Solutions for Researchers
76 pages
Digital Comm: Probability & Algebra
No ratings yet
Digital Comm: Probability & Algebra
5 pages
斯坦福大学机器学习数学基础 33-40
No ratings yet
斯坦福大学机器学习数学基础 33-40
8 pages
Note 6: EECS 189 Introduction To Machine Learning Fall 2020 1 Multivariate Gaussians
No ratings yet
Note 6: EECS 189 Introduction To Machine Learning Fall 2020 1 Multivariate Gaussians
9 pages
Note 10
No ratings yet
Note 10
13 pages
Ta 2
No ratings yet
Ta 2
7 pages
Machine Learning Exercises FS 2013
No ratings yet
Machine Learning Exercises FS 2013
4 pages
ISI MStat 06
No ratings yet
ISI MStat 06
5 pages
ISI MStat PSB Past Year Paper 2014
No ratings yet
ISI MStat PSB Past Year Paper 2014
6 pages
Linear Algebra Exercises
No ratings yet
Linear Algebra Exercises
66 pages
A Step by Step Mathematical Derivation A
No ratings yet
A Step by Step Mathematical Derivation A
32 pages
Econometrics Problem Solutions
No ratings yet
Econometrics Problem Solutions
4 pages
High Dimensional Probability MA3K0 Notes 3
No ratings yet
High Dimensional Probability MA3K0 Notes 3
108 pages
Inequalites Mso205
No ratings yet
Inequalites Mso205
5 pages
Problem Set4
No ratings yet
Problem Set4
11 pages
Stochastic Processes, Detection and Estimation: 6.432 Course Notes
No ratings yet
Stochastic Processes, Detection and Estimation: 6.432 Course Notes
52 pages
Cheat Sheet For The Final Exam
No ratings yet
Cheat Sheet For The Final Exam
6 pages
Math Notes for ECON2125 Students
100% (1)
Math Notes for ECON2125 Students
162 pages
(CBMS-NSF Regional Conference Series in Applied Mathematics) R. Bahadur - Some Limit Theorems in Statistics-Society For Industrial Mathematics (1987)
No ratings yet
(CBMS-NSF Regional Conference Series in Applied Mathematics) R. Bahadur - Some Limit Theorems in Statistics-Society For Industrial Mathematics (1987)
51 pages
Multivariate Statistics - An Introduction 8th Edition
100% (1)
Multivariate Statistics - An Introduction 8th Edition
202 pages
Week3 Lecture1 PDF
No ratings yet
Week3 Lecture1 PDF
27 pages
ECON 1630 Problem Set #2 Fall 2021: Bias Variance
No ratings yet
ECON 1630 Problem Set #2 Fall 2021: Bias Variance
9 pages
Projection
No ratings yet
Projection
12 pages
Mathematics Statistics 1
No ratings yet
Mathematics Statistics 1
212 pages
MATH2010 2022 23 AutumnNotes Gappy
No ratings yet
MATH2010 2022 23 AutumnNotes Gappy
92 pages
Data Mining & Analysis Guide
No ratings yet
Data Mining & Analysis Guide
148 pages
Discrete and Continuous Distributions Guide
No ratings yet
Discrete and Continuous Distributions Guide
9 pages
PRML Solution Manual-2
No ratings yet
PRML Solution Manual-2
122 pages
Placement Exam
100% (2)
Placement Exam
2 pages
Advanced Econometrics PDF
No ratings yet
Advanced Econometrics PDF
58 pages
103 Exercises
No ratings yet
103 Exercises
70 pages
MLPR w0f - Machine Learning and Pattern Recognition
No ratings yet
MLPR w0f - Machine Learning and Pattern Recognition
3 pages
Probability and Statistics Guide
No ratings yet
Probability and Statistics Guide
1 page
An Error Bound in The Sudakov-Fernique Inequality: 1 Statement of The Result
No ratings yet
An Error Bound in The Sudakov-Fernique Inequality: 1 Statement of The Result
5 pages
Machine Learning Practice Quiz
No ratings yet
Machine Learning Practice Quiz
37 pages
Introduction To Probability Theory
No ratings yet
Introduction To Probability Theory
13 pages
Exercise 01 Solution
No ratings yet
Exercise 01 Solution
8 pages
Introduction To Kriging: To Cite This Version
No ratings yet
Introduction To Kriging: To Cite This Version
40 pages
Probability Theory and Stochastic Process Problems11
No ratings yet
Probability Theory and Stochastic Process Problems11
74 pages
Kuttler LinearAlgebra AFirstCourse YorkU MATH2022 Winter2017
No ratings yet
Kuttler LinearAlgebra AFirstCourse YorkU MATH2022 Winter2017
258 pages
Book
No ratings yet
Book
106 pages
Linear and Affine Function Analysis
No ratings yet
Linear and Affine Function Analysis
4 pages
SDN Orchestration Cisco Nexus: Interview Questions & Answers
No ratings yet
SDN Orchestration Cisco Nexus: Interview Questions & Answers
14 pages
Roshan Apna Ghar FAQs Guide
No ratings yet
Roshan Apna Ghar FAQs Guide
2 pages
Scikit Learn
No ratings yet
Scikit Learn
25 pages
SDN Orchestration Cisco Nexus: Interview Questions & Answers
No ratings yet
SDN Orchestration Cisco Nexus: Interview Questions & Answers
14 pages
Customer Journey Non Lien
No ratings yet
Customer Journey Non Lien
9 pages
VPLS via IS-IS for Data Centers
No ratings yet
VPLS via IS-IS for Data Centers
7 pages
Gre Analogies: A Racket Is Used To Play Tennis
No ratings yet
Gre Analogies: A Racket Is Used To Play Tennis
9 pages
ISIS Tutorial
100% (1)
ISIS Tutorial
80 pages
CS701 FT Solved
No ratings yet
CS701 FT Solved
21 pages
CH 05
No ratings yet
CH 05
129 pages
Binomial Distribution
No ratings yet
Binomial Distribution
21 pages
Exercise 3
No ratings yet
Exercise 3
2 pages
Chapter 5: Discrete Probability Distributions: Farouq Alam, Ph.D. Department of Statistics, KAU
No ratings yet
Chapter 5: Discrete Probability Distributions: Farouq Alam, Ph.D. Department of Statistics, KAU
30 pages
Fuzzy Randomness - Towards A New Modeling of Uncertainty: Bernd Möller, Wolfgang Graf, Michael Beer and Jan-Uwe Sickert
No ratings yet
Fuzzy Randomness - Towards A New Modeling of Uncertainty: Bernd Möller, Wolfgang Graf, Michael Beer and Jan-Uwe Sickert
10 pages
Monte Carlo With Applications To Finance in R
100% (2)
Monte Carlo With Applications To Finance in R
73 pages
DLP Statistics
No ratings yet
DLP Statistics
4 pages
Statistics Course Exercises
No ratings yet
Statistics Course Exercises
121 pages
Lesson 6.1 Random Variables, Expectation, and Variance COMPLETE
No ratings yet
Lesson 6.1 Random Variables, Expectation, and Variance COMPLETE
6 pages
【清柚必考词24套】第五套
No ratings yet
【清柚必考词24套】第五套
38 pages
Statistics and Probability (Week 3 and 4)
No ratings yet
Statistics and Probability (Week 3 and 4)
8 pages
Detailed Lesson Plan (DLP) Format: Instructional Planning
100% (1)
Detailed Lesson Plan (DLP) Format: Instructional Planning
2 pages
Week 1 Q3
No ratings yet
Week 1 Q3
22 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
31 pages
EEPF Finals
No ratings yet
EEPF Finals
2 pages
Course Structure Idd Math Computing
No ratings yet
Course Structure Idd Math Computing
92 pages
Chapter3and4extra Yousef
No ratings yet
Chapter3and4extra Yousef
8 pages
Ben Cen 212 2017 - Final
No ratings yet
Ben Cen 212 2017 - Final
8 pages
Statistics Help for Students
No ratings yet
Statistics Help for Students
5 pages
Probability Theory Basics
No ratings yet
Probability Theory Basics
45 pages
LESSON 3 M3 The Normal Distribution
100% (1)
LESSON 3 M3 The Normal Distribution
44 pages
Binomial Distribution
No ratings yet
Binomial Distribution
5 pages
It - r22 Probability and Statistics - Digital Notes.
No ratings yet
It - r22 Probability and Statistics - Digital Notes.
122 pages
Continous Random Variable
No ratings yet
Continous Random Variable
17 pages
STAT320 (Geometric and Other)
No ratings yet
STAT320 (Geometric and Other)
22 pages
Forensic Dna Statistic Evett I Weir PDF
No ratings yet
Forensic Dna Statistic Evett I Weir PDF
306 pages
Exercises PDF
No ratings yet
Exercises PDF
10 pages
Aggregate Loss Models
No ratings yet
Aggregate Loss Models
123 pages
IE 231 - Chapter 4 Recitation (Part 1) : Expected Values of Random Variables
No ratings yet
IE 231 - Chapter 4 Recitation (Part 1) : Expected Values of Random Variables
22 pages
PSQT Tutorial
No ratings yet
PSQT Tutorial
79 pages

Roch Mmids Intro 5exercises

Uploaded by

Roch Mmids Intro 5exercises

Uploaded by

Course: Math 535 - Mathematical Methods in Data Science (MMiDS)

Author: Sebastien Roch, Department of Mathematics, University of Wisconsin-Madison

Chapter 1 - Introduction and review

inner product is . A vector is said to be nonnegative if all of its entries are .

vector notation. Also, express in terms of using vector notation.

squared Euclidean distance and justify it mathematically.

1.3 (Adapted from [VLMS]) If is a vector, then is the vector of size

example, if is the -vector

represents a time series or signal. The quantity

the squared Euclidean distance between two vectors.]

value of . Give the value of the Dirichlet energy achieved.

a dictionary of words appears in a document. For example,

(b) What does mean?

can assume that the document contains at least one word.)

inequality states that . Show that we also have

∥a + b∥2 ≥ ∥a∥2 − ∥b∥2 . [Hint: Apply the triangle inequality to .] (a + b) + (−b) ⊲

of the same size.

Establish the identity

[Hint: Expand the expression for std(x)

with group representatives .

why the representatives are also nonnegative.

nonnegative and sum to one.

interpretation of , the -th entry of the -th group representative.

Suppose we run -means with

nonzero vector and a scalar that satisfy

with success parameter if the possible values of are

compare with the upper bounds you derived.

distribution with parameter if has density function

otherwise. Its mean is and its variance is .

Let be an exponential random variable with parameter

bounds you derived.

that, in particularE(Yi ) = 10 and . Use Chebyshev to give an upper bound on the

E[Xi ] = μ and variance . Suppose further that

|i − j| ≥ 2 and that there is a constant so that

S n = X1 + ⋯ + Xn . Then for any fixed we have

Justify your answer carefully. ⊲

minimum and the maximum by

Find the cumulative distribution functions and of and respectively.

distribution with parameter (see Exercise 1.11 above), and let

Show that for any we have

are independent random variables.

and are not independent.

complicated integrals. Complete the square (see Section 1.3.1.1).]

that there are two functions and such that

[Hint: Choose a distribution symmetric about .] 0

1.25 (Adapted from [BHK]) Let X1 , X2 , … , Xn be i.i.d. random variables with

Calculate . [Hint: Replace

1.26 Compute the inner product of and

function np.dot . Hint: The product of two real numbers and is a * b . a b

u = np.array([1., 2., 3. ,4.])

# EDIT THIS LINE: define v

# EDIT THIS LINE: compute the inner product between u and v

the derivative that

FX (z) = P[X ≤ z], ∀z ∈ R.

(a) Let be the interval where

Let . Show that

in (a). This is called the inverse transform sampling method.

from numpy.random import default_rng

# EDIT THIS LINE: transform X to obtain a random variable Y ~ U[a,b]

1.29 Let Z1 ∼ N(μ1 , σ ) and2

and variance and respectively. Recall that is still Normal.

(a) What are the mean and variance of ? Z1 + Z2

(b) Compute E[∥X1 − X2 ∥ ] and 2

be the centroid of , for

(b) Show that

(c) Show that

j,ℓ∈Ci j∈Ci j,ℓ∈Ci

1.32 Suppose the rows of A ∈ R are given by the transposes of

columns of are given by

References for Chapter 1

You might also like