Optimization in DS Katholische Universität Eichstätt-Ingolstadt
winter semester 2023/2024 Thomas Jahn
Cheat sheet
Notation
∇f (x) gradient, ∇2 f (x) Hessian
A ⪰ B means B − A is positive semidefinite
A ≻ B means B − A is positive definite
x ≥ 0 means that all entries of the vector x are ≥ 0.
Special functions and sets
linear function f (x) = Ax with a matrix A
affine function f (x) = Ax + b with a matrix A and a vector b
quadratic function f (x) = x⊤ Ax + b⊤ x + r with a matrix A, a vector b, a number r
µ-strongly convex function (just “convex” if µ = 0)
µ 2
f (λx + (1 − λ)y) + λ(1 − λ) ∥x − y∥2 ≤ λf (x) + (1 − λ)f (y) for all x, y ∈ Rd , λ ∈ (0, 1)
2
f cont.diff. µ 2
⇐⇒ f (x) − f (y) ≥ ∇f (y)⊤ (x − y) + ∥x − y∥2 for all x, y ∈ Rd
2
f twice cont.diff.
⇐⇒ ∇2 f (x) ⪰ µI for all x ∈ Rd
L-smooth function
∥∇f (x) − ∇f (y)∥2 ≤ L ∥x − y∥2 for all x, y ∈ Rd
f cont.diff. L 2 L 2
⇐⇒ f (y) + ∇f (y)⊤ (x − y) − ∥x − y∥2 ≤ f (x) ≤ f (y) + ∇f (y)⊤ (x − y) + ∥x − y∥2 for all x, y ∈ Rd
2 2
f twice cont.diff.
⇐⇒ −LI ⪯ ∇2 f (x) ⪯ LI for all x ∈ Rd
convex set S ⊂ Rd : λx + (1 − λ)y ∈ S for all x, y ∈ S, all λ ∈ [0, 1].
Algorithms, rates of convergence
k→∞ ∥x(k+1) −x∗ ∥2
• x(k) −→ x∗ q-linearly ⇐⇒ limk→∞ ∈ (0, 1)
∥x(k) −x∗ ∥2
k→∞ ∥x(k+1) −x∗ ∥
• x(k) −→ x∗ q-superlinearly ⇐⇒ limk→∞ x(k) −x∗ 2 = 0
∥ ∥2
∥x(k+1) −x∗ ∥2
• The convergence is q-quadratic ⇐⇒ limk∈∞ 2 < ∞.
∥x(k) −x∗ ∥2
Unconstrained optimization
min f (x) over x ∈ Rd
• 1st-order necessary conditions: x∗ ∈ Rd be a local minimizer of cont.diff. f : Rd → R =⇒ ∇f (x∗ ) = 0.
• 2nd-order necessary conditions: x∗ ∈ Rd be a local minimizer of twice cont.diff. f : Rd → R =⇒ ∇2 f (x∗ ) ⪰ 0
• 2nd-order sufficient condition: f : Rd → R twice cont.diff., ∇f (x∗ ) = 0, ∇2 f (x∗ ) ≻ 0 =⇒ x∗ strict local
minimizer of f
• convex objective function: locally optimal solutions are optimal solutions, set of optimal solutions is convex
but may be empty
• strongly convex, cont.diff. objective function: there exists a unique optimal solution
Line search methods
Setup:
• general form: x(k+1) = x(k) + αk p(k)
• search directions: gradient method p(k) = −∇f(x(k) ), Newton p(k) = −∇2 f (x(k) )−1 ∇f (x(k) )
• step sizes: constant αk = α, Armijo αk = max ᾱρj : f (x(k) + ᾱρj p(k) ) ≤ f (x(k) ) + c1 ᾱρj ∇f (x(k) )⊤ p(k)
1
Theorems:
• f cont.diff., gradient method with Armijo step size =⇒ every accumulation point of (x(k) )k∈N0 is a stationary
point of f
• f cont.diff., µ-strongly convex, L-smooth, gradient method with αk = L1 =⇒ (x(k) )k∈N0 and (f (x(k) ))k∈N0
converge q-linearly
• f twice cont.diff., ∇f (x∗ ) = 0, ∇2 f (x∗ ) invertible =⇒ for x(0) ≈ x∗ , αk = 1 and Newton directions p(k) ,
convergence x(k) → x∗ is q-superlinear (q-quadratic when ∇2 f is Lipschitz)
Linear optimization min c⊤ x over x ∈ Rd s.t. constraints of the form a⊤ x ≤ b, a⊤ x ≥ b, or a⊤ x = b for given
a, c ∈ Rd and b ∈ R.
• c is called cost vector.
• standard form: min c⊤ x s.t. Ax = b and x ≥ 0. We require b ≥ 0.
• Locally optimal solutions are global ones. (Linearity implies convexity.)
• Set of optimal solutions = a face of the feasible set (a polyhedral set), i.e., a vertex, an edge, . . . , or the whole
feasible set.
• If there are optimal solutions, there is a solution among the vertices of the feasible set.
• simplex algorithm: cleverly traversing the vertices of the feasible set
• reduced costs: A⊤ B y = cB , zN = cN − AN y
⊤
• Optimality test: zN ≥ 0 ⇒ STOP
• Choose k ∈ N with zk < 0 and solve AB w = Ak . Unboundedness test: w ≤ 0 ⇒ STOP.
x
• Ratio test: Compute wjj for j ∈ B with wj > 0. Call the smallest value t and the corresponding index r
• Update xB ← xB − tw, xk ← t, k enters B, r leaves B.
• Phase 1 problem: min v1 + . . . + vm over (x, v) s.t. Ax + v = b, x ≥ 0, v ≥ 0, start simplex algorithm with
feasible basis solution (0, b)
• dual problem: max b⊤ y over (y, z) s.t. A⊤ y + z = c, z ≥ 0
• dual variables y, z are computed in the simplex algorithm
• weak duality: b⊤ y ≤ c⊤ x “dual values are less or equal than primal values”
• no duality gap: If b⊤ y = c⊤ x, then x is optimal for the primal problem, (y, z) optimal for the dual problem
• strong duality gap: If one problem is solvable, then both of them are, and the optimal values coincide. If one
problem is unbounded, then the other is infeasible.
• Lagrangian: L(x, y) = c⊤ x + (b − Ax)⊤ y
• KKT conditions: Ax = b, x ≥ 0, A⊤ y + z = c, z ≥ 0, x⊤ z = 0
• KKT conditions have a solution if and only if the primal problem has an optimal solution if and only if the
dual problem has an optimal solution.
Nonlinear constrained optimization min f (x) over x ∈ Rd s.t. gj (x) ≤ 0, hi (x) = 0 with f, gj , hi continuously
differentiable, let C be the feasible set
n (k)
o
• tangent cone TC (x) = y ∈ Rd : ∃ (x(k) ∈ C N , αk ∈ (0, ∞)N , x(k) → x, αk → 0, x αk−x → y
• first-order necessary optimality condition: x∗ is a locally optimal solution ⇒ ∇f (x∗ )⊤ y ≤ 0 for all y ∈ TC (x∗ )
• polar cone: K = y : y x ≤ 0 ∀ x ∈ K
◦ ⊤
• first-order necessary optimality condition again: x∗ is a locally optimal solution ⇒ −nablaf (x∗ ) ∈ TC (x∗ )◦
• linearized tangent cone F (x) = y ∈ Rd : ∇hi (x)⊤ y = 0 ∀ i, ∇gj (x)⊤ y ≤ 0 ∀ j with gj (x) = 0
• always TC (x) ⊆ F (x) and F (x)◦ ⊆ TC (x)◦
• F (x) depends
nP on the constraint functions, TC (x) only dependso on the feasible set
• F (x)◦ = (x) + (x) : 0,
P
µ
j:gj (x)=0 j ∇g j λ h
i i i µj ≥ λi ∈ R
• F (x)◦ = TC (x)◦ is true when constraint qualifications are satisfied
• LICQ is satisfied at x: All the vectors ∇gj (x) and ∇hi (x) are linearly independent (all i, only those j for
which gj (x) = 0)