Computer Science/Optimization

4. Convex optimization problems

  • -
728x90
반응형

Optimization problem in standard form

minimize f0(x)\text{minimize } f_0(x)

subject to

fi(x)0,  i=1,,mhi(x)=0,  i=1,,pf_i(x) \le 0, \; i = 1, \dots, m \\ h_i(x) = 0, \; i = 1, \dots, p
  • xRnx\in \R^n is the optimization variable
  • f0:RnRf_0:\R^n\to \R is the objective or cost function
  • fi:RnR,  i=1,,mf_i:\R^n\to \R, \; i = 1, \dots, m are the inequality constraint functions
  • hi:RnR,  i=1,,ph_i:\R^n\to \R, \; i = 1, \dots, p are the inequality constraint functions

Optimal value

p=inf{fo(x)  fi(x)0,  i=1,,m,hi(x)=0,  i=1,,p}p^* = \inf\{f_o(x) |\; f_i(x)\le 0, \; i = 1, \dots, m, h_i(x) = 0, \; i = 1, \dots, p\}
  1. p=p^* = \infty if a problem is infeasible (no xx satisfies the constraints)
  1. p=p^* = -\infty if a problem is unbounded below

Optimal and locally optimal points

  • xx is feasible if xdom f0x\in \text{dom }f_0 and it satisfies all constraints
  • a feasible xx is optimal f0(x)=pf_0(x) = p^*; XoptX_{opt} is the set of optimal points
  • xx is locally optimal if there is an R>0R>0 such that xx is optimal for
    minimize f0(z)\text{minimize } f_0(z)

    subject to

    fi(z)0,  i=1,,mhi(z)=0,  i=1,,pzx2Rf_i(z) \le 0, \; i = 1, \dots, m \\ h_i(z) = 0, \; i = 1, \dots, p \\ \|z -x\|_2\le R
    💡
    It can be interpreted as the restricted domain of f0f_0

Examples

  • f0(x)=1/x,dom f0=R++f_0(x) = 1/x, \text{dom }f_0 = \R_{++} : p=0p^* = 0, but it has no optimal point.
    💡
    We have to distinguish between optimal point and optimal value
  • f0(x)=logx,dom f0=R++f_0(x) = -\log x, \text{dom }f_0 = \R_{++}: p=p^* = -\infty

Implicit constraints

the standard form optimization problem has an implicit constraint

xD=i=0mdom fii=1pdom hix\in \mathcal D = \bigcap_{i = 0}^m\text{dom }f_i \cap \bigcap_{i = 1}^p\text{dom }h_i

we call D\mathcal D the domain of the problem.

Feasibility problem

find x\text{find }x

subject to

fi(x)0,  i=1,,mhi(x)=0,  i=1,,pf_i(x) \le 0, \; i = 1, \dots, m \\ h_i(x) = 0, \; i = 1, \dots, p

can be considered a special case of the general problem with f0(x)=0f_0(x) = 0

Convex optimization problem

minimize f0(x)\text{minimize }f_0(x)

subject to

fi(x)0,  i=1,,mhi(x)=0,  i=1,,pf_i(x) \le 0, \; i = 1, \dots, m \\ h_i(x) = 0, \; i = 1, \dots, p
  • f0,,fmf_0, \dots, f_m are convex, h1,,hph_1, \dots, h_p are affine
  • problem is quasi-convex if f0f_0 is quasi-convex and f1,,fmf_1, \dots, f_m convex

often written as

minimize f0(x)\text{minimize }f_0(x)

subject to

fi(x)0,  i=1,,mAx=bf_i(x) \le 0, \; i = 1, \dots, m \\ Ax = b
💡
feasible set of a convex optimization problem is a convex set
💡
The convex optimization problem is not an attribute of the feasible set. It is an attribute of the problem description.
  • equivalent problem : by solving one, we can construct the other solution
  • identical problem : objective and constraints are identical

Local and global optima

any locally optimal point of a convex problem is globally optimal

Optimality criterion for differentiable objective function

Suppose that the objective f0f_0 in a convex optimization problem is differentiable, so that for all x,ydom f0x, y\in \text{dom }f_0,

f0(y)f0(x)+f0(x)T(yx)f_0(y) \ge f_0(x) + \nabla f_0(x)^T(y-x)

Then xx is optimal if and only if it is feasible and

f0(x)T(yx)0\nabla f_0(x)^T(y-x)\ge 0

for all feasible yy

  • Proof

    We already know that f0(y)f0(x)+f0(x)T(yx)f_0(y) \ge f_0(x) + \nabla f_0(x)^T(y-x)

    If f0(x)T(yx)0\nabla f_0(x)^T(y- x) \ge 0 for all feasible yy, then f0(y)f0(x)f_0(y) \ge f_0(x) for all feasible yy. Therefore, xx is optimal.

    Conversely, suppose xx is optimal and there exists yy is feasible region such that

    f0(x)T(yx)<0\nabla f_0(x)^T(y-x) < 0

    That means yy is located in the half-space that corresponds to the negative sign of the f0(x)\nabla f_0(x). It contradicts the assumption.

    Therefore, f0(x)T(yx)0\nabla f_0(x)^T(y-x) \ge 0 for all yy in the feasible region.

💡
if non-zero, f0(x)\nabla f_0(x) defines a supporting hyperplane to a feasible set XX at xx
  • unconstrained problem : xx is optimal if and only if
    xdom f0f0(x)=0x\in \text{dom }f _0\quad \nabla f_0(x) = 0
    💡
    In an unconstrained problem case, we can put arbitrary yy so the only option that we can choose is f0(x)=0\nabla f_0(x) = 0
  • equality constrained problem
    minimize f0(x)\text{minimize }f_0(x)

    subject to

    Ax=bAx = b

    (Assume that it is non-empty; otherwise the problem is unfeasible)

    xx is optimal if and only if there exists a ν\nu such that

    xdom f0Ax=bf0(x)+ATν=0x\in \text{dom }f_0 \quad Ax = b \quad \nabla f_0(x) + A^T\nu = 0
    • Proof

      Since x,yx, y is in the feasible region,

      Ax=b and Ay=bAx = b \text{ and }Ay = b

      That means

      xyN(A) x- y \in \mathcal N(A)

      Therefore, the first optimality condition can be written as

      f0(x)Tz0,zN(A)\nabla f_0(x)^Tz\ge 0, \forall z\in \mathcal N(A)

      Since N(A)\mathcal N(A) is a subspace,

      f0(x)T(z)0,N(A)\nabla f_0(x)^T(-z) \ge 0, \forall \in \mathcal N(A)

      That means

      f0(x)Tz=0,zN(A)\nabla f_0(x)^Tz = 0 , \forall z\in \mathcal N(A)

      So,

      f0(x)N(A)\nabla f_0(x) \perp \mathcal N(A)

      Since N(A)R(AT)\mathcal N(A) \perp \mathcal R(A^T), (equivalently, R(AT)\mathcal R(A^T) is row space of AA)

      3F theorem (Linear algebra and its application written by Gilbert Strang)
      f0(x)R(AT)vdom A,f0(x)=ATv\nabla f_0(x) \in \mathcal R(A^T) \\ \Rightarrow \exists v\in \text{dom }A, \nabla f_0(x) = A^Tv

      Since dom A\text{dom }A is a vector space, it is equivalent to say that

      vdom A,f0(x)+ATv=0\exist v\in \text{dom }A, \nabla f_0(x) + A^Tv = 0
    💡
    This is the classical Lagrange multiplier optimality condition
  • minimization over nonnegative orthant
    minimize f0(x)\text{minimize }f_0(x)

    subject to

    x0x\succeq 0

    xx is optimal if and only if

    xdom f0x0{f0(x)i0xi=0f0(x)i=0xi>0x\in \text{dom }f_0 \quad x\succeq 0 \quad \begin{cases}\nabla f_0(x)_i \ge 0 & x_i = 0 \\ \nabla f_0(x)_i = 0 & x_i > 0 \end{cases}
    • Proof

      The first order optimality condition is

      f0(x)T(yx)0\nabla f_0(x)^T(y-x)\ge 0

      for all feasible yy

      Since yy is in the feasible region, y0y\succeq 0. So f0(x)0\nabla f_0(x)\succeq 0, it not f0(x)Ty\nabla f_0(x)^Ty is not unbounded below. Then f0(x)Ty0\nabla f_0(x)^Ty \ge 0 for all feasible y y.

      Therefore

      f0(x)Tx0- \nabla f_0(x)^Tx \ge 0

      Since f0(x)0,x0\nabla f_0(x) \succeq 0, x\succeq 0,

      {f0(x)i0xi=0f0(x)i=0xi>0\begin{cases}\nabla f_0(x)_i \ge 0 & x_i = 0 \\ \nabla f_0(x)_i = 0 & x_i > 0 \end{cases}

Equivalent convex problems

two problems are (informally) equivalent if the solution of one is readily obtained from the solution of the other, and vice-versa

some common transformations that preserve convexity

Eliminating equality constraints

minimize f0(x)\text{minimize }f_0(x)

subject to

fi(x)0,i=1,,mAx=bf_i(x) \le 0, \quad i = 1, \dots, m \\ Ax = b

is equivalent to

minimize zf0(Fz+x0)\text{minimize }_z f_0(Fz+x_0)

subject to

fi(Fz+x0)0i=1,,mf_i(Fz + x_0) \le 0 \quad i = 1, \dots, m

where FF and x0x_0 are such that

Ax=bx=Fz+x0 for some zAx = b\Leftrightarrow x = Fz + x_0\text{ for some }z

Actually let yy is in the feasible region (i.e. Ay=bAy = b and fi(y)0,if_i(y)\le 0, \forall i)

Ax=b,Ay=bxyN(A)Ax = b, Ay = b\\\Rightarrow x - y\in \mathcal N(A)

Let N\mathcal N is a matrix which its columns is the linearly independent vector in N(A)\mathcal N(A)

💡
However, this problem requires at least one feasible point x0x_0
💡
If the original problem is a convex problem, the transformed problem is also a convex problem since it is just the composition of the affine function.

Introducing equality constraints

minimize f0(A0x+b0)\text{minimize }f_0(A_0x + b_0)

subject to

fi(Aix+bi)0,i=1,,mf_i(A_ix + b_i) \le 0, \quad i = 1, \dots, m

is equivalent to

minimize x,yif0(y0)\text{minimize }_{x, y_i}f_0(y_0)

subject to

fi(yi)0,i=1,,myi=Aix+bi,i=1,,mf_i(y_i)\le 0, \quad i = 1, \dots, m\\y_i = A_ix + b_i, \quad i = 1, \dots, m
💡
Since this approach eventually increases the number of constraints, there is no progress at first glance. However, this approach can lead to significant progress.

Introducing slack variables for linear inequalities

minimize f0(x)\text{minimize }f_0(x)

subject to

aiTxbii=1,,ma_i^Tx \le b_i \quad i = 1, \dots, m

is equivalent to

minimize x,sf0(x)\text{minimize }_{x, s}f_0(x)

subject to

aiTx+si=bi,i=1,,msi0i=1,,ma_i^Tx + s_i = b_i, \quad i = 1, \dots, m \\ s_i \ge 0 \quad i = 1, \dots, m
💡
This approach can be applicable when we solve the linear programming problem.

Epigraph form

standard form convex problem is equivalent to

minimize x,tt\text{minimize }_{x, t}t

subject to

f0(x)t0fi(x)0,i=1,,mAx=bf_0(x) - t\le 0\\f_i(x)\le 0, \quad i = 1, \dots, m\\ Ax = b
💡
By using this, we can change our objective function as a linear objective.

Minimizing over some variables

minimize f0(x1,x2)\text{minimize }f_0(x_1, x_2)

subject to

fi(x1)0i=1,,mf_i(x_1) \le 0 \quad i = 1, \dots, m

is equivalent to

minimize fˉ0(x1)\text{minimize }\bar f_0(x_1)

subject to

fi(x1)0i=1,,mf_i(x_1) \le 0 \quad i = 1, \dots, m

where fˉ0(x1)=infx2f0(x1,x2)\bar f_0(x_1) = \inf_{x_2}f_0(x_1, x_2)

💡
We already know that fˉ0\bar f_0 is a convex function.
💡
Dynamic programming preserves the convexity of the problem.

Quasi-convex optimization

minimize f0(x)\text{minimize }f_0(x)

subject to

fi(x)0,i=1,,mAx=bf_i(x) \le 0, \quad i = 1, \dots, m\\Ax = b

with f0:RnRf_0:\R^n\to \R quasi-convex, f1,,fmf_1, \dots, f_m convex.

Unlike the convex optimization case, it can have locally optimal points that are not globally optimal.

Nevertheless, a variation of the first-order optimality condition does hold for quasi-convex optimization problems with differentiable objective functions.

In quasi-convex optimization problem, xx is optimal if

xX,f0(x)T(yx)>0,yX{x}x\in X, \quad \nabla f_0(x)^T(y- x) > 0, \forall y\in X \setminus \{x\}
💡
Unlike the convex optimization case, it is only sufficient for optimality.

Representation via a family of convex functions

It will be convenient to represent the sublevel sets of a quasi-convex function ff via inequalities of convex functions. We seek a family of convex functions ϕt\phi_t such that

f0(x)tϕt(x)0f_0(x) \le t\Leftrightarrow \phi_t(x) \le 0

Evidently ϕt\phi_t must satisfy the property that for all xRnx\in \R^n,

ϕs(x)0ϕt(x)0\phi_s(x) \le 0 \Rightarrow \phi_t(x) \le 0

for sts\ge t. This is satisfied if for each xx, ϕt(x)\phi_t(x) is a non-increasing function of tt (i.e. ϕs(x)ϕt(x)\phi_s(x) \le \phi_t(x) whenever sts\ge t.

If we take

ϕt(x)={0f(x)totherwise\phi_t(x) = \begin{cases}0 & f(x) \le t \\ \infty &\text{otherwise}\end{cases}

it satisfies the above conditions.

💡
We call it the indicator function of the t-sublevel of ff

Note that this representation is not unique, for example, if the sublevel sets of ff are closed, we can take

ϕt(x)=dist(x,{zf(z)t})=supz{d(z,x)  f(z)t}\begin{aligned}\phi_t(x) &= \text{dist}(x, \{z|f(z) \le t\}) \\ &= \sup_z\{d(z, x) | \; f(z) \le t\}\end{aligned}

Example

f0(x)=p(x)q(x)f_0(x) = \frac{p(x)}{q(x)}

with pp convex, qq concave, and p(x)0,q(x)>0p(x) \ge 0, q(x) > 0 on dom f0\text{dom }f_0 can take ϕt(x)=p(x)tq(x)\phi_t(x) = p(x) - tq(x)

  1. for t0,ϕtt\ge 0, \phi_t convex in xx
  1. p(x)/q(x)tp(x)/q(x) \le t if and only if ϕt(x)0\phi_t(x) \le 0

Quasi-convex optimization via convex feasibility problems

Let ϕt:RnR,tR\phi_t:\R^n\to \R, t\in \R be a family of convex functions that satisfy

f0(x)tϕt(x)0f_0(x) \le t \Leftrightarrow \phi_t(x) \le 0

and also, for each xx, ϕt(x)\phi_t(x) is a non-increasing function of tt, i.e.

stϕs(x)ϕt(x)s \ge t \Rightarrow \phi_s(x) \le \phi_t(x)

Let pp^* is the optimal value of the quasi-convex optimization problem.

If the feasibility problem

find x\text{find }x

subject to

ϕt(x)0fi(x)0,i=1,,mAx=b\phi_t(x) \le 0 \\ f_i(x) \le 0, \quad i = 1, \dots, m \\ Ax = b

is feasible, then we have ptp^*\le t. Conversely, if the problem is infeasible, then we can conclude ptp^*\ge t.

💡
Note that we want to find the minimum tt such that satisfy the above feasibility problem. Since ϕt(x)\phi_t(x) is a non-increasing of tt, if tt satisfy the feasibility for sts\ge t, ss is also satisfy the feasibility.
💡
Note that the above problem is a convex feasibility problem, since the inequality constraint functions are all convex, and the equality constraints are linear.

This can be used as the basis of a simple algorithm for solving the quasi-convex optimization problem using bisection (solving a convex feasibility problem at each step)

Assume that the problem is feasible, and start with an interval [l,u][l, u] known to contain the optimal value pp^*. When then solve the convex feasibility problem at its midpoint t=(l+u)/2t = (l+u) /2, to determine wheter the optimal value is in the lower or upper half of the interval, and update the interval accordingly. This produces a new interval, which also contains the optimal value, but has half the width of the initial interval.

Basically, the idea is same as the parametric search algorithm.

The blue area in the above figure represents the range of t that satisfies feasibility.

Canonical Problems

Depending on the types of objective function and constraints, optimization problems can be categorized. We will delve into the following six subcategories

  • Linear Programming (LP)
  • Quadratic Programming (QP)
  • Quadratically Constrained Quadratic Programming (QCQP)
  • Second-Order Cone Programming (SOCP)
  • Semidefinite Programming (SDP)
  • Conic Programming (CP)

The above problems have the following inclusion relationships, and moving to the right, they can be seen as more generalized forms.

LPQPQCQPSOCPSDPCPLP\subset QP\subset QCQP\subset SOCP\subset SDP\subset CP

Linear program (LP)

minimize cTx+d\text{minimize }c^Tx + d

subject to

GxhAx=bGx\preceq h \\ Ax = b
  • Convex problem with affine objective and constraint functions (everything in a linear programming is affine)
  • feasible set is a polyhedron
💡
Note that the level set of the objective function is just a hyperplane.

But be cautious, as it is not as simple as it might think at first glance because the number of vertices will increase exponentially.

Examples

Piecewise-linear minimization

minimize xmaxi=1,,maiTx+b\text{minimize }_x\max_{i = 1, \dots, m}a_i^Tx + b

equivalent to an LP

minimize t\text{minimize }t

subject to

aiTx+biti=1,,ma_i^Tx + b_i \le t \quad i = 1, \dots, m
💡
Actually it is just a epigraph form.
Piecewise linear function
In mathematics and statistics, a piecewise linear, PL or segmented function is a real-valued function of a real variable, whose graph is composed of straight-line segments.&#91;1&#93;
https://en.wikipedia.org/wiki/Piecewise_linear_function

Chebyshev center of a polyhedron

Chebyshev center of

P={x  aiTxbi,  i=1,,m}\mathcal P = \{x|\; a_i^Tx\le b_i, \; i = 1, \dots, m\}

is center of largest inscribed ball

B={xc+u  u2r}\mathcal B = \{x_c + u|\; \|u\|_2 \le r\}

aiTxbia_i^Tx\le b_i for all xBx\in \mathcal B if and only if

sup{xiT(xc+u)  u2r}=aiTsc+rai2bi\sup\{x_i^T(x_c + u)|\; \|u\|_2\le r\} = a_i^Ts_c + r\|a_i\|_2 \le b_i
💡
It is linear in scs_c and rr

Hence xc,rx_c, r can be determined by solving the LP

maximize r\text{maximize }r

subject to

aiTxc+rai2bii=1,,ma_i^Tx_c + r\|a_i\|_2\le b_i \quad i = 1, \dots , m

Linear-fractional program

minimize f0(x)\text{minimize }f_0(x)

subject to

GxhAx=bGx\preceq h \\ Ax = b

where

f0(x)=cTx+deTx+fdom f0(x)={x  eTx+f>0}f_0(x) = \frac{c^Tx + d}{e^Tx + f} \quad \text{dom }f_0(x) = \{x| \; e^Tx + f > 0\}
💡
We already know that linear fractional function is a quasi-convex function

It can be solved by bisection like this

f0(x)tcTx+dt(eTx+f)f_0(x) \le t \Leftrightarrow c^Tx + d \le t(e^Tx + f)

If the feasible set is non-empty, the linear-fractional program can be transformed to an equivalent linear program

minimize cTy+dz\text{minimize } c^Ty + dz

subject to

GyhzAy=bzeTy+fz=1z0Gy\preceq hz \\ Ay = bz \\ e^Ty + fz = 1 \\ z \ge 0

with variables y,zy, z

  • Proof

    If xx is feasible in our original problem, then

    y=xeTx+f,z=1eTx+fy = \frac{x}{e^Tx +f}, \quad z = \frac{1}{e^Tx + f}

    is feasible in the LP problem, with the same objective value cTy+dz=f0(x)c^Ty + dz = f_0(x). Therefore the optimal value of the original problem is greater than or equal to the optimal value of the LP problem. It is has a y,zy, z corresponds to the optimal point of the original problem.

    Conversely,

    1. if (y,z)(y, z) is feasible in the LP problem with z0z\ne 0, then x=y/zx = y/z is feasible in the original problem.
    1. If (y,z)(y,z) is feasible in the LP problem with z=0z = 0 and x0x_0 is feasible for the original problem. Then x=x0+tyx = x_0 + ty is feasible in the original problem for all t0t\ge 0. Moreover, limtf0(x0+ty)=cTy+dz\lim{t\to \infty}f_0(x_0 + ty) = c^Ty + dz, so we can find feasible points in the original function with objective values arbitrarily close to the objective value of (y,z)(y, z)
    💡
    The basic idea is that if we add the additional variable zz, we can normalize it by using xx and zz. Therefore we can eliminate the denominator. So if zz is positive, we can think it as a normalization.
    💡
    If z=0z = 0, we can arbitrary add yy to x0x_0. This is a main idea to prove above.

Generalized linear-fractional program

f0(x)=maxi=1,,rciTx+dieiTx+fjf_0(x) = \max_{i = 1, \dots, r} \frac{c_i^Tx + d_i}{e_i^Tx + f_j}

where dom f0(x)={x  eiTx+fi>0,  i=1,,r}\text{dom }f_0(x) = \{x|\; e_i^Tx + f_i > 0, \; i = 1, \dots, r\}

a quasi-convex optimization problem. Unlike the linear fractional problem, there is no way to directly convert to LP. However, since it is a quasi-convex problem, it can be solved by bisection.

💡
The sublevel set is just the intersection of sublevel sets which is convex. Therefore, the objective function is a quasi-convex function.

Quadratic program (QP)

The convex optimization problem is called a quadratic program(QP) if the objective function is convex quadratic, and the constraint functions are affine.

minimize 12xTPx+qTx+r\text{minimize }\frac{1}{2}x^TPx + q^Tx + r

subject to

GxhAx=bGx\preceq h \\ Ax = b

where PS+n,GRm×nP\in S_{+}^n, G\in \R^{m\times n} and ARp×nA \in \R^{p\times n}

In a quadratic program, we minimize a convex quadratic function over a polyhedron.

Note that if PS++nP\in S_{++}^n, a sublevel set is a elipsoid

  • Proof

    Since PP is symmetric and positive definite matrix, it has a orthonomal eigenvector. Let say eigenvectors {v1,,vn}\{v_1, \dots, v_n\}, and eigenvalue {λ1,,λn}\{\lambda_1, \dots, \lambda_n\}.

    {x  f(x)α}={x  12xTPx+qTx+rα}\{x|\; f(x) \le \alpha\} = \{x|\; \frac{1}{2}x^TPx + q^Tx + r\le \alpha\}

    Let x=x1v1++xnvnx = x_1 v_1 + \cdots + x_nv_n

    {x  f(x)α}={x  12(λ1v1x12++λnvnxn2+q1x1++qnxn+r)α}={x12((λ1v1x12+q1x1)++(λnvnxn2+qnxn))αr}\begin{aligned}\{x|\; f(x) \le \alpha\} &= \{ x|\; \frac{1}{2}(\lambda_1\|v_1\|x_1^2 + \cdots + \lambda_n\|v_n\|x_n^2 + q_1x_1 + \cdots + q_nx_n + r)\le \alpha\} \\ &= \{x|\frac{1}{2}((\lambda_1\|v_1 \|x_1^2 + q_1x_1) + \cdots + (\lambda_n \|v_n\|x_n^2 + q_nx_n)) \le \alpha - r\} \end{aligned}

    Therefore, it is a ellipsoid if it is not a empty set.

💡
If PP is not a positive semi-definite (i.e. it has a negative eigenvalue), the problem is belongs to NP hard.

Quadratically constrained quadratic program (QCQP)

minimize 12xTP0x+q0Tx+r0\text{minimize }\frac{1}{2}x^TP_0x + q_0^Tx + r_0

subject to

12xTPix+qiTx+ri0,i=1,,mAx=b\frac{1}{2}x^TP_i x + q_i^Tx + r_i \le 0, \quad i = 1, \dots, m \\ Ax = b

where PiS+nP_i\in S_{+}^n.

If P1,,PmS++nP_1, \dots, P_m\in S_{++}^n, the feasible region is a intersection of mm ellipsoids and an affine set.

Examples

Lest-squares

We already know that

Axb22=xTATAx2bTAx+bTb\|Ax - b\|_2^2 = x^TA^TAx - 2b^TAx + b^Tb

and since

xTATAx=(Ax)T(Ax)=Ax,Ax=Ax0\begin{aligned}x^TA^TAx &= (Ax)^T(Ax) \\ &= \langle Ax, Ax\rangle \\ &= \|Ax\| \ge 0\end{aligned}

for all xx, ATAA^TA is a positive semi-definite matrix.

Therefore

minimize Axb22\text{minimize }\|Ax - b\|_2^2

it is a Quadratic programming.

Actually, this problem is simple enough to have the well known analytical solution x=Abx^* = A^\dagger b (AA^\dagger is a pseudo-inverse of AA)

When linear inequality constraints are added, the problem is called constrained regression or constrained least-squares

💡
There are no analytical solution for these cases.

As an example we can consider regression with lower and upper bounds on the variables

minimize Axb22\text{minimize } \|Ax - b\|_2^2

subject to

lixiuii=1,,nl_i\le x_i \le u_i \quad i = 1, \dots, n

There is another example like this

minimize Axb22\text{minimize } \|Ax - b\|_2^2

where x1x2xnx_1 \le x_2 \le \cdots \le x_n

Actually, it is just adding a set of linear inequalities. Therefore, it is a quadratic programming.

Second-order cone programming (SOCP)

minimize fTx\text{minimize }f^Tx

subject to

Aix+bi2ciTx+di,i=1,,mFx=g\|A_ix + b_i\|_2 \le c_i^Tx + d_i, \quad i = 1, \dots, m \\ Fx = g

where xRn,AiRni×n,FRp×nx\in \R^n, A_i\in \R^{n_i\times n}, F\in \R^{p\times n}.

We call a constraint of the form

Ax+b2cTx+d\|Ax + b\|_2 \le c^Tx + d

where ARk×nA\in \R^{k\times n}, a second-order cone constraint because

(Aix+bi,ciTx+di)second-order cone in Rni+1(A_ix + b_i, c_i^Tx + d_i) \in \text{second-order cone in }\R^{n_i + 1}
  1. If ci=0c_i = 0 for all i=1,,mi = 1, \dotsm, m, then the SOCP is equivalent to a QCQP.
  1. If Ai=0A_i = 0 for all i=1,,m i = 1, \dots, m, then the SOCP reduces to a general LP
💡
It is more general than QCQP and LP

Let fi(x)=Aix+bi2ciTxdif_i(x) = \|A_ix + b_i\|_2 - c_i^Tx -d_i. Then, it is not differentiable for all xx since it is not differentiable at Aix+bi=0A_i x + b_i = 0. It might seems it is okay to neglect that points at the first glance, but we can’t because normally that the solution is very often at these points.

💡
Currently, we can solve SOCP as fast as LP.

Examples

Robust linear programming

the parameters in optimization problems are often uncertain

minimize cTx\text{minimize }c^Tx

subject to

aiTxbii=1,,ma_i^Tx \le b_i\quad i = 1, \dots, m

there can be uncertainty in c,ai,bic, a_i, b_i

two common approaches to handling uncertainty (for simplicity, aia_i has only uncertainty)

  1. deterministic model: constraints must hold for all aiEia_i\in \mathcal E_i
    minimize cTx\text{minimize }c^Tx

    subject to

    aiTxbi for all aiEii=1,,ma_i^Tx \le b_i \text{ for all }a_i\in \mathcal E_i \quad i = 1, \dots, m
    💡
    Since constraints must satisfy for all uncertainty region, this approach is called worst case model.

    Let an ellipsoid as Ei\mathcal E_i

    Ei={aˉi+Piu  u21}\mathcal E_i = \{\bar a_i + P_i u |\; \|u\|_2 \le 1\}

    where aˉiRn,PiS+n\bar a_i\in \R^n, P_i\in S_+^n

    Then our constraints can be expressed as

    sup{aiTx  aiEi}bisup{(aˉi+Piu)Tx  u21}biaˉiTx+sup{uTPiTx  u21}biaˉiT+PiTx2bi\sup\{a_i^Tx|\; a_i\in \mathcal E_i\}\le b_i \\ \Leftrightarrow \sup\{(\bar a_i + P_iu)^Tx|\; \|u\|_2\le 1\} \le b_i \\ \Leftrightarrow \bar a_i^Tx + \sup\{u^TP_i^Tx|\; \|u\|_2 \le 1\} \le b_i \\ \Leftrightarrow \bar a_i^T + \|P_i^Tx\|_2 \le b_i

    Therefore it can be represented as a SCOP

    minimize cTx\text{minimize }c^Tx

    subject to

    aˉiTx+PiTx2bii=1,,m\bar a_i^Tx + \|P_i^Tx \|_2 \le b_i \quad i = 1, \dots, m
    💡
    Note that the additional norm terms act as regularization terms ; they prevent xx from being large in directions with considerable uncertainty in the parameters aia_i.
    💡
    Since SOCP is as fast as LP, it can be applicable when we want to solve the finance problem.
  1. stochastic model: aia_i is random variable; constraints must hold with probability η\eta
    minimize cTx\text{minimize }c^Tx

    subject to

    prob(aiTxbi)ηi=1,,m\text{prob}(a_i^Tx\le b_i)\ge \eta \quad i = 1, \dots, m
    💡
    Alleviate the condition by using probability

    Assume aia_i is Gaussian with mean aˉi\bar a_i, covariance Σi\Sigma_i (i.e. aiN(aˉi,Σi)a_i\sim \mathcal N(\bar a_i, \Sigma_i))

    Then aiTxN(aˉiTx,xTΣix)a_i^Tx \sim \mathcal N(\bar a_i^Tx, x^T\Sigma_ix). So,

    prob(aiTxbi)=Φ(biaˉiTxΣi1/2x2)\text{prob}(a_i^Tx\le b_i) = \Phi\bigg(\frac{b_i - \bar a_i^Tx}{\|\Sigma_i^{1/2}x\|_2}\bigg)

    where Φ(x)\Phi(x) is CDF of N(0,1)\mathcal N(0, 1)

    Then it can be written as

    minimize cTx\text{minimize }c^Tx

    subject to

    aˉiTx+Φ1(η)Σi1/2x2bii=1,,m\bar a_i^Tx + \Phi^{-1}(\eta) \|\Sigma_i^{1/2}x\|_2 \le b_i \quad i = 1, \dots, m

    If Φ1(η)0\Phi^{-1}(\eta) \ge 0  (i.e. η1/2\eta \ge 1/2), it is equivalent to the SOCP.

Geometric program (GP)

Monomial function

f(x)=cx1a1x2a2xnandom f=R++nf(x) = cx_1^{a_1}x_2^{a_2}\cdots x_n^{a_n} \quad \text{dom }f = \R_{++}^n

with c>0c>0 and aiRa_i\in \R, is called a monomial function

💡
In mathematics, monomial has non-negative integers as its exponents. So the above definition is not a standard definition.

Posynomial function

it is just a sum of monomials

f(x)=k=1Kckx1a1kx2a2kxnankdom f=R++nf(x) = \sum_{k = 1}^K c_k x_1^{a_{1k}}x_2^{a_{2k}}\cdots x_n^{a_{nk}} \quad \text{dom }f = \R_{++}^n

Geometric program (GP)

minimize f0(x)\text{minimize }f_0(x)

subject to

fi(x)1,i=1,,mhi(x)=1i=1,,pf_i(x) \le 1, \quad i = 1, \dots, m\\ h_i(x) = 1 \quad i = 1, \dots, p

where f0f_0 is a posynomial function, hih_i is a monomial function.

Geometric program in convex form

Geometric programs are not generally convex optimization problem, but they can be transformed to convex problems by a change of variables and a transformation of the objective and constraint functions.

  1. monomial f(x)=cx1a1xnanf(x) = cx_1^{a_1}\cdots x_n^{a_n} transforms to
    logf(ey1,,eyn)=aTy+b\log f(e^{y_1}, \dots, e^{y_n}) = a^Ty + b

    where b=logcb = \log c

  1. posynomial f(x)=k=1Kckx1a1kxnankf(x) = \sum_{k = 1}^K c_k x_1^{a_{1k}}\cdots x_n^{a_{nk}} transforms to
    logf(ey1,,eyn)=log(k=1KeeakTy+bk)\log f(e^{y1}, \dots, e^{y_n}) = \log \bigg (\sum_{k = 1}^K e^{e^{a_k^Ty + b_k}}\bigg)

    where bk=logckb_k = \log c_k

    💡
    We already know that log-sum-exp is a convex function in yy

Therefore, geometric program transforms to a convex problem

minimize log(k=1Kexp(a0kTy+b0k))\text{minimize }\log\bigg (\sum_{k = 1}^K \exp(a_{0k}^Ty + b_{0k})\bigg)

subject to

log(k=1Kexp(aikTy+bik))0giTy+hi=0\log\bigg(\sum_{k = 1}^K \exp(a_{ik}^Ty + b_{ik})\bigg) \le 0 \\ g_i^Ty + h_i = 0

Generalized inequality constraints

One very useful generalization of the standard form convex optimization problem is obtained by allowing the inequality constraint functions to be vector valued , and using generalized inequalities in the constraints.

minimize f0(x)\text{minimize }f_0(x)

subject to

fi(x)Ki0i=1,,mAx=bf_i(x) \preceq_{K_i} 0 \quad i = 1, \dots, m \\ Ax = b

where f0:RnR,KiRkif_0:\R^n\to \R, K_i\subset R^{k_i} are proper cones, and fi:RnRkif_i:\R^n\to\R^{k_i} KiK_i-convex.

💡
Later, we will see that convex optimization problems with generalized inequality constraints can often be solved as easily as ordinary convex optimization problems.

Conic form problems

special case with affine objectives and constraints

minimize cTx\text{minimize }c^Tx

subject to

Fx+gK0Ax=bFx + g \preceq_K 0 \\ Ax = b

extends linear programming (K=R+mK= \R_{+}^m) to non-polyhedral cones

Semidefinite programming

minimize cTx\text{minimize }c^Tx

subject to

x1F1++xnFn+G0Ax=bx_1 F_1 + \cdots + x_nF_n + G\preceq0 \\ Ax = b

where G,F1,,FnSkG, F_1, \dots, F_n\in S^k, and ARp×nA\in \R^{p \times n}

💡
inequality constraint is called linear matrix inequality (LMI). That means the left-hand side is negative semi-definite.

Note that it includes problems with multiple LMI constraints

x1Fˉ1++xnFˉn+Gˉ0x1F~1++xnF~n+G~0x_1\bar F_1+ \cdots + x_n\bar F_n + \bar G \preceq 0 \\ x_1\tilde F_1+ \cdots + x_n\tilde F_n + \tilde G \preceq 0

is equivalent to single LMI

x1[Fˉ100F~1]+x2[Fˉ200F~2]++xn[Fˉn00F~n]+[Gˉ00G~]0x_1\begin{bmatrix}\bar F_1 & 0 \\ 0 & \tilde F_1\end{bmatrix} + x_2\begin{bmatrix}\bar F_2 & 0 \\ 0 & \tilde F_2\end{bmatrix} + \cdots + x_n \begin{bmatrix}\bar F_n & 0 \\ 0 & \tilde F_n\end{bmatrix} + \begin{bmatrix}\bar G & 0 \\ 0 & \tilde G\end{bmatrix} \preceq 0

Example

  1. LP and equivalent SDP
    • LP
      minimize cTx\text{minimize }c^Tx

      subject to

      AxbAx\preceq b
    • SDP
      minimize cTx\text{minimize }c^Tx

      subject to

      diag(Axb)0\text{diag}(Ax - b)\preceq0
    💡
    Use the fact that if the given diagonal matrix has only non-positive elements, it is a negative semi-definite matrix.
  1. SOCP and equivalent SDP
    • SOCP
      minimize fTx\text{minimize }f^Tx

      subject to

      Aix+bi2ciTx+dii=1,,m\|A_ix + b_i\|_2 \le c_i^T x +d_i \quad i = 1, \dots, m
    • SDP
      minimize fTx\text{minimize } f^Tx

      subject to

      [(ciTx+di)IAix+bi(Aix+bi)TciTx+di]0i=1,,m\begin{bmatrix}(c_i^Tx + d_i)I & A_ix + b_i \\ (A_ix + b_i)^T & c_i^Tx + d_i\end{bmatrix}\succeq 0 \quad i = 1, \dots, m
    • Proof

      Before we start, we have to know the following fact (Schur complement of positive semidefinite matrix)

      X=[ABBTD]S=DBTA1BX = \begin{bmatrix}A & B \\ B^T & D\end{bmatrix} \\ S = D - B^TA^{-1}B

      Then,

      if D0D\succ 0 then

      X0 iff S0X \succeq 0 \text{ iff }S \succeq 0

      We already know that ciTx+di0c_i^Tx + d_i \succeq 0 by non-negativity of the norm. Therefore,

      [(ciTx+di)IAix+bi(Aix+bi)TciTx+di]0 iff ciTx+di(Aix+bi)T(Aix+bi)ciTx+di0[(ciTx+di)IAix+bi(Aix+bi)TciTx+di]0 iff Aix+bi22(ciTx+di)2\begin{bmatrix}(c_i^Tx + d_i)I & A_ix + b_i \\ (A_ix + b_i)^T & c_i^Tx + d_i\end{bmatrix}\succeq 0 \text{ iff } c_i^Tx + d_i - \frac{(A_ix+b_i)^T(A_ix + b_i)}{c_i^Tx + d_i} \succeq 0\\ \Rightarrow \begin{bmatrix}(c_i^Tx + d_i)I & A_ix + b_i \\ (A_ix + b_i)^T & c_i^Tx + d_i\end{bmatrix}\succeq 0 \text{ iff } \|A_ix+b_i\|_2^2 \le (c_i^Tx + d_i)^2

      Since ciTx+di0c_i^Tx + d_i \ge 0,

      [(ciTx+di)IAix+bi(Aix+bi)TciTx+di]0 iff Aix+bi2ciTx+di\begin{bmatrix}(c_i^Tx + d_i)I & A_ix + b_i \\ (A_ix + b_i)^T & c_i^Tx + d_i\end{bmatrix}\succeq 0 \text{ iff } \|A_ix+b_i\|_2 \le c_i^Tx + d_i
  1. Eigenvalue minimization
    minimize λmax(A(x))\text{minimize }\lambda_{\max}(A(x))

    where A(x)=A0+x1A1++xnAnA(x) = A_0 + x_1 A_1 + \cdots + x_nA_n with given AiSkA_i\in S^k

    we can express it as a SDP form

    minimize t\text{minimize }t

    subject to

    A(x)tIA(x) \preceq tI

    where xRn,tRx\in \R^n, t\in \R

    it follows from the fact that

    λmax(A)tAtI\lambda_{\max}(A) \le t \Leftrightarrow A\preceq tI
    • Proof

      λ\lambda is an eigenvalue of A(x)A(x) if and only if λt\lambda - t is an eigenvalue of A(x)tIA(x) - tI.

      Therefore, λmax(A(x))t\lambda_{\max} (A(x)) \le t if and only if all eigenvalues of A(x)tIA(x) -tI are non-positive. (i.e. A(x)tI0A(x) -tI \preceq 0)

  1. Matrix norm
    minimize A(x)2:=(λmax(A(x)TA(x)))1/2\text{minimize }\|A(x)\|_2 := (\lambda_{\max}(A(x)^TA(x)))^{1/2}

    where xRn,tR,A(x)=A0+x1A1++xnAnx\in \R^n, t\in \R , A(x) = A_0 + x_1A_1 + \cdots + x_nA_n with given AiRp×qA_i\in \R^{p\times q}

    💡
    We already know that matrix norm is a convex function.

    we can express it as a SDP form

    minimize t\text{minimize }t

    subject to

    [tIA(x)A(x)TtI]0\begin{bmatrix}tI & A(x) \\ A(x)^T & tI\end{bmatrix}\succeq 0

    the above matrix is positive semi-definite if

    tI0tIA(x)TA(x)t0tI\ge 0 \\ tI - \frac{A(x)^TA(x)}{t} \succeq 0

    Since tt is positive,

    t2IA(x)TA(x)t^2I \succeq A(x)^TA(x)

    Therefore, all eigenvalues of A(x)TA(x)A(x)^TA(x) are less than equal to t2t^2

Vector optimization

General vector optimization problem

minimize f0(x) (with respect to K)\text{minimize }f_0(x) \text{ (with respect to K)}

subject to

fi(x)0i=1,,mhi(x)=0i=1,,pf_i(x) \le 0 \quad i = 1, \dots, m\\ h_i(x) = 0 \quad i = 1, \dots, p

vector objective f0:RnRqf_0:\R^n\to \R^q minimized with respect to proper cone KRqK\in \R^q

💡
How to compare the value of vectors? There’s a lot of ways to do that. (i.e. lexicographical order)

Convex vector optimization problem

minimize f0(x) (with respect to K)\text{minimize }f_0(x) \text{ (with respect to K)}

subject to

fi(x)0i=1,,mAx=bf_i(x) \le 0 \quad i = 1, \dots, m\\ Ax = b

with f0f_0 KK-convex, f1,,fmf_1, \dots, f_m convex.

Optimal and Pareto optimal points

set of achievable objective values

O={f0(x)  x feasible}\mathcal O = \{f_0(x) |\; x \text{ feasible} \}
  • feasible xx is optimal if f0(x)f_0(x) is a minimum value of O\mathcal O

    xx^* is optimal since it is comparable to any other points in O\mathcal O and it is also less than any other point in O\mathcal O

  • feasible xx is Pareto optimal if f0(x)f_0(x)  is a minimal value of O\mathcal O

    xpox^{po} is Pareto optimal since there is non-comparable point in O\mathcal O

Multi-criterion optimization

vector optimization problem with K=R+qK = \R_{+}^q

f0(x)=(F1(x),,Fq(x))f_0(x) = (F_1(x), \dots, F_q(x))

where we want all FiF_i’s to be small

  1. feasible xx^* is optimal if
    y feasiblef0(x)f0(y)y\text{ feasible} \Rightarrow f_0(x^*) \preceq f_0(y)

    if there exists an optimal point, the objectives are non-competing

  1. feasible xpox^{po} is Pareto optimal if
    y feasible,f0(y)f0(xpo)f0(xpo)=f0(y)y\text{ feasible}, f_0(y) \preceq f_0(x^{po}) \Rightarrow f_0(x^{po}) = f_0(y)

    if there are multiple Pareto optimal values, there is a trade-off between the objectives.

Example : Regularized least-squares

minimize (Axb22,x22) (w.r.t R+2)\text{minimize }(\|Ax - b\|_2^2, \|x\|_2^2) \text{ (w.r.t }\R_+^2)
💡
In the deep learning concept, this concept can be interpreted as a regularization.

Scalarization

Scalarization is a standard technique for finding Pareto optimal points for a vector optimization problem. To find Pareto optimal points: choose λK0\lambda \succ_{K^*}0 and solve scalar problem

minimize λTf0(x)\text{minimize }\lambda^T f_0(x)

subject to

fi(x)0,i=1,,mhi(x)=0i=1,,pf_i(x) \le 0, \quad i = 1, \dots,m \\ h_i(x) = 0 \quad i = 1, \dots, p
💡
One way to say that f0f_0 is KK-convex, if λTf0(x)\lambda^T f_0(x)  is convex a function function for all λK\lambda \in K^*
💡
Any optimal points in this problem is the Pareto optimal problem for the original problem.
💡
for convex vector optimization problems, can find (almost) all Pareto optimal points by varying λK0\lambda \succ_{K^*}0

Scalarization for multicriterion problems

to find Pareto optimal points, minimize positive weighted sum

λTf0(x)=λ1F1(x)++λqFq(x)\lambda^T f_0(x) = \lambda_1 F_1(x) + \cdots + \lambda_qF_q(x)

Example

Regularized least-squares problem

Take λ=(1,γ)\lambda = (1, \gamma) with γ>0\gamma > 0

minimize Axb22+γx22\text{minimize }\|Ax - b\|_2^2 + \gamma \|x\|_2^2

for fixed γ\gamma

💡
In Deep learning perspective, this can be interpreted as a regularization term.
반응형
Contents

포스팅 주소를 복사했습니다

이 글이 도움이 되었다면 공감 부탁드립니다.