Optimization Models
Optimization Models
Alberto Bemporad
http://cse.lab.imtlucca.it/~bemporad/teaching/numopt
Application domains:
• Optimization theory
        – Optimality conditions, sensitivity analysis
        – Duality
• Optimization algorithms
        – Basics of numerical linear algebra
        – Convex programming
        – Nonlinear programming
     fuel
  [ℓ/km]
                                                    velocity
           0       30     60     90     120   160
                                                     [km/h]
                                                    velocity
           0       30     60     90     120   160
                                                     [km/h]
                    min f (x)
                         x                       f(x*)
                                                                                           x
                                                                            x*
    ∗
  f = minx f (x) = optimal value
  x∗ = arg minx f (x) = optimizer                         x ∈ Rn , f : R n → R
                                                          
                                                     x1
                    maxx f (x)                        . 
                                                 x =  .. 
                                                     
                                                          ,   f (x) = f (x1 , x2 , . . . , xn )
                                                       xn
                                             min f (x)
                                              x
     fuel
  [ℓ/km]
                                                    velocity
           0       30     60     90     120   160
                                                     [km/h]
                     best fuel
     fuel         consumption
  [ℓ/km]
                                                    optimal
                                                    velocity
                                                    velocity
           0       30     60     90     120   160
                                                     [km/h]
                  s.t. g(x) ≤ 0
                       h(x) = 0
                                                  g(x)  0
                                                                                         x
• Scaling f (x) to αf (x) and/or gi (x) to βi gi (x) does not change the solution, for
  all α, βi > 0. Same if hj (x) is scaled to γj hj (x), ∀γj ̸= 0
    with respect to x =              [ ab ]
                           a∗ 
• The problem               b∗
                                   = arg min f ([ ab ]) is a least-squares problem: ŷ = a∗ u + b∗
In MATLAB:                                                      1.5
In Python: y 0
-0.5
import numpy as np -1
   A=np.hstack((u,np.ones(u.shape)))                            -1.5
                                                                    -1    -0.5   0   0.5    1
x=np.linalg.lstsq(A,y,rcond=0)[0] u
                                   
                                                                       2.5
                               1
                             u 
                              1
                                                                       2
                      ϕ(u) =  u21 
                              3
                              u1 
                                                                        1.5
                               u41
                                                                         1
                                                                              0    0.5     1     1.5   2
-2
                                                                           -3
                                                                             -2   -1     0   1   2      3
2 http://www.utc.fr/~mottelet/mt94/leastSquares.pdf
                               convex set
                                                                       nonconvex set
                                      S                                 x1        x2
                         x1                     x2
Jensen’s inequality
                                                                      mλ(1 − λ)
         f (λx1 + (1 − λ)x2 ) ≤ λf (x1 ) + (1 − λ)f (x2 ) −                     ∥x1 − x2 ∥22
                                                                         2
• Every local solution is also a global one (we will see this later)
                   P = {x ∈ R : Ax ≤ b} n
                                                                                                        v3
                                                                            v1                            A
                                                                                                             1 x=
                                                                                                                 b1
                                                                        A3x=b3               A3
• Vertex (V-)representation:
                                        X
                                        q               X
                                                        p               Convex hull = transformation
      P = {x ∈ Rn : x =                       αi vi +         βj rj }   from V- to H-representation
                                        i=1             i=1
                                                                        Vertex enumeration =
                      ∑
                      q
    αi , βj ≥ 0,            αi = 1, vi , rj ∈ R    n
                                                                        transformation from H- to
                      i=1
                                                                        V-representation
    when q = 0 the polyhedron is a cone
                                                                        vi = vertex, rj = extreme ray
©2020 A. Bemporad - Numerical Optimization                                                                     23/101
Linear programming
• Linear programming (LP) problem:
                                                                                                      -c
             min c′ x
                                                                                                     x*
             s.t. Ax ≤ b, x ∈ Rn
                  Ex = f                                                                                   George Dantzig
                                                                                                            (1914–2005)
                                                               ′
• LP in standard form:                                  min    cx
                                                        s.t.   Ax = b
                                                               x ≥ 0, x ∈ Rn
• Conversion to standard form:
       1. introduce slack variables
                                             ∑
                                             n                     ∑
                                                                   n
                                                   aij xj ≤ bi ⇒         aij xj + si = bi , si ≥ 0
                                             j=1                   j=1
Ex = f 2
         2 x Qx         =      2x ( 2 ′ +  2 )x      = 12 x′ ( Q+Q     1 ′      1   ′ ′ ′
                                                                2 )x + 4 x Qx − 4 (x Q x)
                               1 ′ Q+Q
                        =      2 x ( 2 )x
• In some problems the optimization variables can only take integer values.
  We call x ∈ Z an integrality constraint
• When all variables are integer (or binary) the problem is an integer
  programming problem (a special case of discrete optimization)
• In a mixed integer programming (MIP) problem some of the variables are real
  (xi ∈ R), some are discrete/binary (xi ∈ Z or xi ∈ {0, 1})
• Many good solvers are available (CPLEX, Gurobi, GLPK, Xpress-MP, CBC, ...)
  For comparisons see http://plato.la.asu.edu/bench.html
        – Robustness = perform well on a wide variety of problems in their class, for all
          reasonable values of the initial guess x0
        – Accuracy = find a solution close to the optimal one, without being affected by
          rounding errors due to the finite precision arithmetic of the CPU
Optimization
                                                                                                                      Integer          Combinatorial
                                             Unconstrained                                        Constrained
                                                                                                                   Programming         Optimization
  Nonlinear                                Nondifferentiable
                 Nonlinear Equations                               Global Optimization      Nonlinear Programming           Network Optimization       Bound Constrained    Linearly Constrained
 Least Squares                              Optimization
                                            Second-Order                                       Complementarity
                                          Cone Programming                                        Problems
                                       Quadratically-Constrained
                                        Quadratic Programming
   https://neos-guide.org/content/optimization-taxonomy
 Optimization Guide
   Introduction
 ©2020 A. Bemporad - Numerical Optimization                                                                                                                                                         31/101
Optimization software
• Comparison on benchmark problems:
                                             http://plato.la.asu.edu/bench.html
http://www.neos-server.org
http://www.coin-or.org/
2. Single out the optimization variables xi (what are we able to decide?) and their
   domain (real, binary, integer)
3. Treat the remaining variables as parameters (=data that affect the problem but
   are not part of the decision process)
5. Are there constraints on the decision variables ? If yes, translate them into
   (in)equalities involving x
                                                                                                    c x0
                                                                                                 +
                                                                                             Ax Qx
                                                                                                  b
                                                                                              2 x0
                                                                                            1
                                                                                          s.t n
                                                                                            mi
                                                                                             .
                                                              analysis of the solution
• It may take several iterations to formulate the optimization model properly, as:
– The solution does not make sense (is any constraint missing or wrong?)
– The optimal value does not make sense (is the cost function properly defined?)
– It takes too long to find the solution (can we simplify the model?)
         A small joinery makes two different sizes of boxwood chess sets. The small set requires 3 hours of machining on a
         lathe, and the large set requires 2 hours. There are four lathes with skilled operators who each work a 40 hour week,
         so we have 160 lathe-hours per week. The small chess set requires 1 kg of boxwood, and the large set requires 3 kg.
         Unfortunately, boxwood is scarce and only 200 kg per week can be obtained. When sold, each of the large chess sets
         yields a profit of $20, and one of the small chess set has a profit of $5.
The problem is to decide how many sets of each kind should be made each week so as to maximize profit.
     • A small joinery makes two different sizes of boxwood chess sets. The small set requires 3 hours of machining on a
         lathe, and the large set requires 2 hours.
     • There are four lathes with skilled operators who each work a 40 hour week, so we have 160 lathe-hours per week.
     • The small chess set requires 1 kg of boxwood, and the large set requires 3 kg. Unfortunately, boxwood is scarce and
         only 200 kg per week can be obtained.
     • When sold, each of the large chess sets yields a profit of $20, and one of the small chess set has a profit of $5.
• The problem is to decide how many sets of each kind should be made each week so as to maximize profit.
• Constraints:
        – 3xs + 2xℓ ≤ 4 · 40 (maximum lathe-hours)
2     Linear Programming
    • What is the best solution ? A numerical solver provides the following solution
        We have just built a model for the decision process that the joinery owner has to make. We have isolated
                                x∗ =(how
        the decisions he has to make
                                       0, x  ∗ of each type of chess set∗to manufacture), and taken his objective
                                           many
                                                 = 66.6666 ⇒ f (x ) = 1333.3 $
        of maximizing profit. Thes constraintsℓ acting on the decision variables have been analyzed. We have given
        names to his variables and then written down the constraints and the objective function in terms of these
        variable names.
        At the same time as doing this we have made, explicitly or implicitly, various assumptions. The explicit
    ©2020 A. Bemporad - Numerical
        assumptions     that weOptimization
                                  noted were:                                                                39/101
Optimization models
• Optimization models, as all mathematical models, are never an exact
  representation of reality but a good approximation of it
• There are usually many different models for the same real problem
                                               ADE OFF
                                             TR
        – Simple enough to be able to solve the resulting optimization problem
• Even if the solution is not really applied, it provides a good suggestion to the
  decision maker (you should only make large chess sets)
• Making the model and analyzing the solution allows a better understanding of
  the problem at hand (small chess sets are not profitable)
• We can analyze the robustness of the solution with respect to variations in the
  data (big changes of solution for small changes of prices?)
• GNU MathProg a subset of AMPL associated with the free package GLPK
  (GNU Linear Programming Kit)
        xs = sdpvar(1,1);
        xl = sdpvar(1,1);
optimize(Constraints,-Profit)
value(xs),value(xl),value(Profit)
        cvx_clear
        cvx_begin
        variable xs(1)
        variable xl(1)
Profit = 5*xs+20*xl;
maximize Profit
        subject to
        3*xs+2*xl <= 4*40; % maximum lathe-hours
        1*xs+3*xl <= 200; % available kg of boxwood
        xs>=0;
        xl>=0;
        cvx_end
xs,xl,Profit
        import casadi.*
        xs=SX.sym('xs');
        xl=SX.sym('xl');
        Profit = 5*xs+20*xl;
        Constraints = [3*xs+2*xl-4*40; 1*xs+3*xl-200];
        prob=struct('x',[xs;xl],'f',-Profit,'g',Constraints);
        solver = nlpsol('solver','ipopt', prob);
        res = solver('lbx',[0;0],'ubg',[0;0]);
        Profit = -res.f;
        xs = res.x(1);
        xl = res.x(2);
• Model and solve the problem using Optimization Toolbox (The Mathworks, Inc.)
        xs=optimvar('xs','LowerBound',0);
        xl=optimvar('xl','LowerBound',0);
        Profit = 5*xs+20*xl;
        C1 = 3*xs+2*xl-4*40<=0;
        C2= 1*xs+3*xl-200<=0;
        prob=optimproblem('Objective',Profit,'ObjectiveSense','max');
        prob.Constraints.C1=C1;
        prob.Constraints.C2=C2;
[sol,Profit] = solve(prob);
        xs=sol.xs;
        xl=sol.xl;
• The Hybrid Toolbox for MATLAB contains interfaces to various solvers for LP,
  QP, MILP, MIQP (http://cse.lab.imtlucca.it/~bemporad/hybrid/toolbox) (Bemporad, 2003-today)
• However, when there are many variables and constraints forming the problem
  matrices manually can be very time-consuming and error-prone
optimization
    variables
                                  cost function
        B6:C6
                                   =SUMPRODUCT(B6:C6;B2:C2)
Reference:
                                                         0   ℓ1                u1   x1
• Example: ``We cannot sell more than 100 units of Product A''
• Pay attention: some solvers assume nonnegative variables by default!
• When ℓi = ui the constraint becomes xi = ℓi and variable xi becomes
  redundant. Still it may be worthwhile keeping in the model
• Example: ``I can get water from 3 suppliers, S1, S2 and S3. I
                                             x1
  want to have at least 1000     liters available.'' x1 + x2 + x3 ≥ 1000
                          total flow
                                             x2
• Example: ``I have 50 trucks available to xrent
                                              3
                                                     to 3 customers C1,
    C2 and C3'' x1 + x2 + x3 ≤ 50
• Losses can be included as well: ``2% water I get from suppliers gets
    lost.'' 0.98x1 + 0.98x2 + 0.98x3 ≥ 1000
• The technological coefficients Rji denote the amount of resource j used per
  unit of activity i
• Example:
  ``Small chess sets require 1 kg                            ``Small chess sets require 3
  boxwood, the large ones 3 kg,                              lathe hours, the large ones 2 h,
  total available is 200 kg.''                               total time is 4×40 h.''
  x1 + 3x2 ≤ 200                                             3x1 + 2x2 ≤ 160
                                             R = [ 23 32 ] , Rmax = [ 200
                                                                      160 ]
• Balance constraints model the fact that “what goes out must in total equal
  what comes in”
                                                              L
                                                    x1in               x1out
                X
                N                 X
                                  M
                      xout
                       i      =          xin
                                          i    +L   x2in
                i=1                i=1
                                                    x3in              x2out
• Example: ``I have 100 tons steel and can buy more from
    suppliers 1,2,3 to serve customers A,B.'' xA + xB = 100 + x1 + x2 + x3
• Example: ``The cash I'll have tomorrow is what I have now plus
    what I receive minus what I spend today.'' xt+1 = xt + ut − yt
                                             PN
• Of course we can replace y with i=1 xi everywhere in the model (condensed
  form), but this would make it less readable
• We call the new variable ϵj panic variable: it should be normally zero but can
  assume a positive value in case there is no way to fulfill the constraint set
         The company Steel has received an order for 500 tonnes of steel to be used in shipbuilding. This steel must have the
         certain characteristics (``grades''). The company has seven different raw materials in stock that may be used for the
         production of this steel. The objective is to determine the composition of the steel that minimizes the production cost.
     • The company Steel has received an order for 500 tonnes of steel to be used in shipbuilding.
     • This steel must have the certain characteristics (``grades'').
     • The company has seven different raw materials in stock that may be used for the production of this steel.
     • The objective is to determine the composition of the steel that minimizes the production cost.
• “The company Steel has received an order for 500 tonnes of steel''
                                             ∑7
        – flow constraint y =                 j=1   xj (produced metal)
         1         Carbon (C)                      2                         3
         2         Copper (Cu)                    0.4                       0.6
         3         Manganese (Mn)                 1.2                      1.65
x∗1 = 400, x∗3 = 39.7763, x∗5 = 2.7613, x∗6 = 57.4624, x∗2 = x∗4 = x∗7 = 0 [t]
        x=sdpvar(7,1);
        y=sum(x);
cost = sum(c.*x);
optimize(Constraints,cost)
value(x),value(cost)
        cvx_clear
        cvx_begin
        variable x(7)
cost = sum(c.*x);
minimize cost
        subject to
        y=sum(x); y>=500;
        x>=0; x<=R;
        P'*x<=B(:,2)*y;
        P'*x>=B(:,1)*y;
cvx_end
x,cost
   Result
   Every convex piecewise affine function
                                                                                                     a04x + b4
   ℓ : Rn → R can be represented as the                           `(x)
                                                                             a01x + b1
   max of affine functions, and vice versa
                                             (Schechter, 1987)                           a03x + b3
  Example:
  ℓ(x) = max {a′1 x + b1 , . . . , a′4 x + b4 }                                                              x
                                                                 a02x + b2
                        ϵ                                         minϵ,x   ϵ
                                                                            
                                                                             ϵ ≥ a′1 x + b1
                                                                            
                                                                             ϵ ≥ a′ x + b
                                                                                    2       2
                                                                      s.t.          ′
                                                                            
                                                                             ϵ ≥ a   x + b
                                                                            
                                                                            
                                                                                    3       3
                                                  x                           ϵ ≥ a′4 x + b4
∥u∥2 ≤ r∗ } contained in P
                         sup Ai (x + u) = Ai x + r∥Ai ∥2 ≤ bi , ∀i = 1, . . . , m,
                       ∥u∥2 ≤r
                                  maxx,r r
                                     s.t. Ai x + r∥Ai ∥2 ≤ bi , i = 1, . . . , m
References:
• The convex hull of N points x̄1 , . . . , x̄N is the set of all their convex
  combinations
                                                                                          x1            x2
                                 P
     S = {x ∈ Rn : ∃λ ∈ RN : x =    λi x̄i ,
                                    PN                                                    x7
                             λi ≥ 0, i=1 λi = 1}                                                    xN        x3
                                                                                     x6
                                                                                x5
                                                                                               x4
• A convex cone of N points x̄1 , . . . , x̄N is the set
                                                                                               x1
                                                        X
      S = {x ∈ R : ∃λ ∈ R n                   N
                                                  :x=       λi x̄i , λi ≥ 0}                        x
x2
                                                                      a’x ≤ b
• The set {x : a′ x ≤ b}, a ̸= 0 is called halfspace
• Hyperplanes, halfspaces, polyhedra, balls, and ellipsoids are all convex sets
• Sublevel sets Cα of convex functions are convex sets (but not vice versa)
Cα = {x ∈ dom f : f (x) ≤ α}
∇2 f (x) ≽ 0, ∀x ∈ dom f
      General composition rule: h(f1 (x), . . . , fk (x)) is convex when h is convex and
      for each i = 1, . . . , k
          h is increasing in argument i, and fi convex, or
          h is decreasing in argument i, and fi concave, or
          fi is affine
                                                                                        x*
                                      1             1
                                        ∥Ax − b∥22 = x′ A′ Ax − b′ Ax + b′ b                 1 x0 Qx + c0 x = constant
                                                                                             2
                                      2             2
• A generalization of constrained least squares is quadratic programming (QP)
                                              1 ′
                                       min      x Qx + c′ x
                                              2
                                         s.t. Ax ≤ b            Q = Q′ ≽ 0
                                              Ex = f
                              1 ′            1                     1
                                x Qx + c′ x = ∥Lx − (−L−1 )′ c∥22 − c′ Q−1 c
                              2              2                     2
           min c′ x
                                             E[c] = c̄, Var[c] = E[(c − c̄)(c − c̄)′ ] = Σ
           s.t. Ax ≤ b, Ex = f
• We want to trade off the expectation of c′ x with its variance (=risk) with a risk
  aversion coefficient γ ≥ 0
                              1
                           min ∥Ax − b∥22 + λ∥x∥1             A ∈ Rm×n , b ∈ Rm
                            x 2
               1
            min ∥Ax − b∥22 + λ∥x∥1
             x 2                               50
          A ∈ R1000×3000 , b ∈ R3000
                                               45
                                                10-4   10-3   10-2   10-1   100   101    102
• A, B = random matrices
                                             1000
• A sparse with 3000 nonzero entries
                                              800
400
                                    min c′ x
                                    s.t. prob(a′i x ≤ bi ) ≥ η, i = 1, . . . , m
                              min c′ x
                              s.t. ā′i x + Φ−1 (η)∥LΣ x∥2 ≤ bi , i = 1, . . . , m
• Problem to solve:
                          Qn
          maxx,y            i=1 yi
                                                                  nonlinear, nonconvex,
             s.t.         A(x + diag(v)y) ≤ b, ∀v ∈ {0, 1}n
                                                                  many constraints!
                          y≥0
• Reformulate as maximize log(volume), remove redundant constraints:
                                 X
                                 n
                                                                  convex problem
              minx,y         −          log(yi )
                                  i=1
                    s.t. Ax + A+ y ≤ b,            y≥0            A+
                                                                   ij = max{Aij , 0}
                          min c′ x           min c′ x
                          s.t. Ax ≤ b        s.t. diag(Ax − b) ≼ 0
• Good SDP packages exist (SeDuMi, SDPT3, Mathworks LMI Toolbox, ...)
• A monomial function f : Rn
                           ++ → R++ , where R++ = {x ∈ R : x > 0}, has
  the form
                                     f (x) = cxa1 1 xa2 2 . . . xann , c > 0, ai ∈ R
• A posynomial function f : Rn
                             ++ → R++ is the sum of monomials
                                             X
                                             K
                             f (x) =               ck xa1 1k xa2 2k . . . xannk , ck > 0, aik ∈ R
                                             k=1
• One can prove that F (y) = log fP (ey ) is convex and so it is the program
                                         P                
                                            K     a′k y+bk
                              min log           e
                                         Pk=1 ′            
                                            K
                                s.t. log    k=1 e
                                                  cik y+dik
                                                              ≤ 0, i = 1, . . . , m
                                         Ey + f = 0
• Constraints:
        – total wall area 2(hw + hd) ≤ Awall
        – floor area wd ≤ Aflr
        – upper and lower bounds on aspect ratios α ≤ h/w ≤ β , γ ≤ w/d ≤ δ
  CVX                                        YALMIP
     cvx_begin gp quiet                        sdpvar h w d
     variables h w d
     % obj. function = box volume              C = [alpha <= h/w <= beta,
     maximize(h*w*d)                           gamma <= d/w <= delta, h>=0,
     subject to                                w>=0];
     2*(h*w + h*d) <= Awall;                   C = [C, 2*(h*w+h*d) <= Awall,
     w*d <= Afloor;                            w*d <= Afloor];
     alpha <= h/w <= beta;
     gamma <= d/w <= delta;                    optimize(C,-(h*w*d))
     cvx_end
                                             yalmip.github.io/tutorial/geometricprogramming
     opt_volume = cvx_optval;
  CVXPY
     import cvxpy as cp                      constraints = [
                                             2*(h*w + h*d) <= Awall,
     alpha = 0.5                             w*d <= Afloor,
     beta = 2.0                              alpha <= h/w, h/w <= beta,
     gamma = 0.5                             gamma <= d/w, d/w <= delta]
     delta = 2.0
     Awall = 1000.0                          problem = cp.Problem(cp.Maximize
     Afloor = 500.0                                      (obj), constraints)
                                             problem.solve(gp=True)
     h = cp.Variable(pos=True)
     w = cp.Variable(pos=True)               print("h: ", h.value)
     d = cp.Variable(pos=True)               print("w: ", w.value)
                                             print("d: ", d.value)
     obj = h * w * d                         print("volume: ", problem.value)
                                 √
        – Example: min x with x ≥ 0, is a nonconvex problem, but we can minimize
           √
          ( x)2 = x instead
                                             ∏n
        – Example: max f (x) =                i=1 xi is a nonconvex problem, but the function
                                ∑n
            log(f (x)) =            i=1 log(x 1 ) is concave