Computer Science/Optimization

2. Convex set

  • -
728x90
반응형

Affine set

  • Line : all points through x1,x2x_1, x_2
x=θx1+(1θ)x2x = \theta x_1 + (1 - \theta) x_2

where θR\theta\in \R

This idea can be generalized to more than two points.

θ1x1+θ2x2++θkxk\theta_1x_1 + \theta_2 x_2 + \cdots + \theta_kx_k

where θ1++θk=1\theta_1 + \cdots + \theta_k = 1

We refer to a point can be expressed as the following form as an affine combination of the points x1,xkx_1 , \dots x_k

  • affine set : contains every affine combination of its points

If CC is an affine set and x0Cx_0\in C, then the set

V=Cx0:={xx0xC}V = C - x_0 := \{x - x_0 | x\in C\}

is a subspace, i.e.i.e., closed under vector addition and scalar multiplication.

  • Proof

    Let v1,v2V,α,βRv_1, v_2\in V, \alpha, \beta \in \R. Then v1+x0,v2+x0Cv_1 + x_0, v_2 + x_0 \in C

    Moreover,

    αv1+βv2+x0=αv1+βv2+(α+(1α))x0=α(v1+x0)+β(v2+x0)αv1+βv2+x0C\begin{align}\alpha v_1 + \beta v_2 + x_0 &= \alpha v_1 + \beta v_2 + (\alpha + (1 - \alpha))x_0 \\ &= \alpha(v_1 + x_0) + \beta(v_2 + x_0)\end{align} \\ \Rightarrow \alpha v_1 + \beta v_2 + x_0 \in C

    That means αv1+βv2V\alpha v_1 + \beta v_2 \in V

    Therefore VV is closed under vector addition and scalar multiplication.

Thus, the affine set CC can be expressed as

C=V+x0:={v+x0vV}C = V + x_0 := \{v + x_0 | v\in V\}

The set of all affine combinations of points in some set CRnC\subset \R^n is called the affine hull of CC, and denoted aff C\text{aff }C

aff C={θ1x1++θkxkx1,,xkC,θ1++θk=1}\text{aff } C = \{\theta_1x_1 + \cdots + \theta_kx_k|x_1, \dots, x_k\in C, \theta_1 + \dots + \theta_k = 1\}
💡
The affine hull is the smallest affine set that contains CC

Affine dimension and relative interior

We define the dimension of an affine set CC as the dimension of the subspace VV. But it is not always consistent with other definitions of dimension.

Let’s think about the unit circle in R2\R^2. Its affine dimension is two. By most definitions of dimension, however, the unit circle in R2\R^2 has dimension one.

We define the relative interior of the set CC, denoted relint C\text{relint } C, as its interior relative to aff C\text{aff }C

relint C={xCr,Br(x)aff CC}\text{relint } C = \{x\in C | \exist r, B_r(x) \cap \text{aff }C\subset C\}
💡
Topologically, we only consider a open set relative to CC

Convex set

  • line segment : all points between x1x_1 and x2x_2
    x=θx1+(1θ)x2x = \theta x_1 + (1 - \theta) x_2

    where 0θ10\le \theta \le 1

  • convex set : contains line segment between any two points in the set
    x1,x2C,θx1+(1θ)x2C\forall x_1, x_2\in C, \theta x_1 + (1-\theta)x_2 \in C

    where 0θ10 \le \theta\le 1

💡
Note that affine set is not a convex set

List of convex set

  1. convex hull
  1. convex cone/ norm cone
  1. half-space
  1. euclidean ball / norm ball
  1. ellipsoid
  1. polyhedra
  1. simplex

Convex combination and convex hull

  • convex combination : any point xx of the form
    x=θ1x1++θkxkx = \theta_1x_1 + \cdots + \theta_k x_k

    where θ1++θk=1,θi0\theta_1 + \cdots + \theta_k = 1 , \theta_i \ge 0

    💡
    Generalization of the convex set
  • convex hull : set of all convex combinations

    The convex hull of a set CC, denoted convC\text{conv} C, is the set of all convex combinations of points in CC

    conv C={θ1x1++θkxkxiC,  θi0,  i=1,,k,θ1++θk=1}\text{conv } C = \{\theta_1x_1 + \cdots + \theta_k x_k|x_i \in C, \;\theta_i \ge 0,\; i = 1, \dots, k, \,\theta_1 + \cdots + \theta_k = 1\}
    💡
    The convex hull conv C\text{conv }C is always convex. Moreover, it is the smallest convex set that contains CC

Convex cone

A set CC is called a cone if for every xCx\in C  and θ0\theta \ge 0, we have θxC\theta x\in C.

A set CC is a convex cone if it is convex and cone, which means that

x1,x2C and θ1,θ20,θ1x1+θ2x2C\forall x_1, x_2\in C \text{ and }\theta_1, \theta_2\ge 0, \theta_1 x_1 + \theta_2 x_2 \in C
💡
This condition is stronger than the condition of convexity. Therefore, convex cone is trivially a convex set.

A point of the form

θ1x1++θkxk\theta_1x_1 + \cdots + \theta_k x_k

where θ1,,θk0\theta_1, \dots, \theta_k \ge 0

is called a conic combination of x1,,xkx_1, \dots, x_k.

Hyperplane

A hyperplane is a set of the form

{xaTx=b}\{x | a^Tx = b\}

where aRn,a0,bRa\in \R^n, a\ne 0, b\in \R

Let

a={vaTv=0}a^\perp = \{v | a^Tv = 0\}

Then,

{xaTx=b}=x0+a\{x | a^Tx = b\} = x_0 + a^\perp

where x0{xaTx=b}x_0 \in \{x | a^Tx = b\}

Halfspace

A hyperplane divides Rn\R^n into two halfspaces. A (closed) halfspace is a set of the form

{xaTxb}\{x|a^Tx\le b\}

where a0a\ne 0

💡
Trivially, half-spaces are convex, but not affine.

Euclidean balls and ellipsoids

A Euclidean ball in Rn\R^n has the form

Br(xc)={xxxc2r}B_r(x_c) = \{x | \|x- x_c\|_2 \le r\}

where r>0r > 0

💡
In mathematical perspective, we normally erase the boundary to make it an open set.

We can expressed above term differently

Br(xc)={xc+ru  u21}B_r(x_c) = \{x_c + ru |\; \|u\|_2 \le 1\}

A Euclidean ball is a convex set

  • Proof

    Let x1,x2Br(xc),θ0x_1, x_2 \in B_r(x_c), \theta \ge 0

    θx1+(1θ)x2xc=θ(x1xc)+(1θ)(x2xc)θ(x1xc)+(1θ)(x2xc)θ+(1θ)=1\begin{aligned}\|\theta x_1 + (1 - \theta)x_2 - x_c\| &= \|\theta(x_1 - x_c) + (1 - \theta)(x_2 - x_c) \| \\ &\le \|\theta(x_1 - x_c)\| + \|(1 - \theta)(x_2 - x_c)\| \\ & \le \theta + (1- \theta) = 1\end{aligned}

    So, θx1+(1θ)x2Br(xc)\theta x_1 + (1 - \theta)x_2 \in B_r(x_c).

    Therefore, a euclidean ball is convex set.

A ellipsoid in Rn\R^n has the form

E={x(xxc)TP1(xxc)1}\mathcal E = \{x | (x- x_c)^TP^{-1}(x - x_c) \le 1\}

where P=PT0P = P^T \succ 0 (i.e. PP  is symmetric and positive definite)

💡
In mathematical perspective, it is a quadratic form and its matrix is positive definite.

We can expressed above term differently

{xc+Au  u21}\{x_c + Au |\; \|u \|_2 \le 1\}

where AA is square and nonsingular.

Norm balls and norm cones

A norm ball is the generalization of a Euclidean ball

{x  xxcr}\{x | \; \|x- x_c\| \le r\}

A norm cone associated with the norm \|\cdot \| is the set

C={(x,t)  xt}Rn+1C = \{(x, t) | \; \|x\|\le t \} \subset \R^{n + 1}
💡
Note that norm cone is a convex cone. However, we can’t always certain that convex cone is a norm cone (i.e. R2\R^2)

As we proved in Euclidean ball case, a norm ball is also a convex set. Moreover, a norm cone is a convex set

  • Proof

    Let (x1,t1),(x2,t2)C,θ0(x_1, t_1) , (x_2, t_2) \in C, \theta \ge 0, i.e.i.e.

    x1t1,x2t2\|x_1\| \le t_1, \|x_2 \| \le t_2

    We want to show that

    θ(x1,t1)+(1θ)(x2,t2)C\theta(x_1, t_1) + (1 - \theta)(x_2, t_2 ) \in C

    It is equivalent to check whether

    θx1+(1θ)x2θx1+(1θ)x2θt1+(1θ)t2\begin{aligned}\|\theta x_1 + (1 - \theta)x_2\| &\le \|\theta x_1\| + \|(1 - \theta)x_2\| \\ &\le \theta t_1 + (1- \theta) t_2\end{aligned}

    Then, θx1+(1θ)x2C\theta x_1 + (1 - \theta) x_2 \in C

    Therefore, CC is a convex set.

Polyhedra

A polyhedron is defined as the solution set of a finite number of linear equalities and inequalities

P={xajTxbj,  j=1,,m,  cjTx=dj,  j=1,,p}\mathcal P = \{x | a_j^Tx \le b_j, \; j = 1, \dots, m, \; c_j^Tx = d_j, \; j = 1, \dots, p\}

It will be convenient to use the compact notation

P={xAxb,Cx=d}\mathcal P = \{x | Ax \preceq b, Cx = d\}

where the symbol \preceq denotes vector inequality or component-wise inequality in Rn\R^n.

A polyhedron is thus the intersection of a finite number of half-space and hyperplanes.

A polyhedron is a convex set

  • Proof

    Since half space and hyperplane are convex set, it is enough to show that intersection of convexs set is also a convex set.

    Let A,BA, B is a convex set and x1,x2AB,θ0x_1, x_2 \in A\cap B, \theta \ge 0

    Since x1,x2Ax_1, x_2 \in A

    θx1+(1θ)x2A\theta x_1 + (1 - \theta)x_2 \in A

    For the similar reason,

    θx1+(1θ)x2B\theta x_1 + (1 - \theta)x_2 \in B

    So, θx1+(1θ)x2AB\theta x_1 + (1 - \theta)x_2 \in A\cap B

    Therefore, the intersection of a convex sets is also a convex set.

    By using this conclusion, we can easily show that a polyhedron is a convex set.

Simplex

Suppose the k+1k + 1 points v0,,vkRnv_0, \dots, v_k\in \R^n are affinely independent, which means v1v0,,vkv0v_1 - v_0, \cdots, v_k - v_0 are linearly independent

💡
In mathematical perspective, we can view viv0Tv0Rn,iv_i - v_0 \in T_{v_0}R^n, \forall i. That means each tangent vectors are linearly independent.

The simplex determined by them is given by

C=conv{v0,,vk}={θ0v0++θkvkθ0,1Tθ=1}\begin{aligned}C &= \text{conv} \{v_0, \dots ,v_k\} \\ &= \{\theta_0v_0 + \cdots + \theta_kv_k|\theta\succeq0, 1^T\theta = 1\}\end{aligned}

The affine dimension of this simplex is kk, so it is sometimes referred to as a k-dimensional simplex in Rn\R^n

The four simplexes that can be fully represented in 3D space.

The simplex is so-named because it represents the simplest possible polytope in any given dimension. For example,

  • a 0-dimensional simplex is a point
  • a 1-dimensional simplex is a line segment
  • a 2-dimensional simplex is a triangle
  • a 3-dimensional simplex is a tetrahedron

Specifically, a k-simplex is a k-dimensional polytope that is the convex hull of its k + 1 vertices.

Simplex
In geometry, a simplex is a generalization of the notion of a triangle or tetrahedron to arbitrary dimensions. The simplex is so-named because it represents the simplest possible polytope in any given dimension. For example,a 0-dimensional simplex is a point, a 1-dimensional simplex is a line segment, a 2-dimensional simplex is a triangle, a 3-dimensional simplex is a tetrahedron, and a 4-dimensional simplex is a 5-cell.
https://en.wikipedia.org/wiki/Simplex

To describe the simplex as a polyhedron, we proceed as follows.

By definition, xCx\in C iff x=θ0v0++θkvkx = \theta_0 v_0 + \cdots + \theta_k v_k for some θ0\theta \succeq 0 with 1Tθ=11^T\theta = 1. Define y=(θ1,,θk)y = (\theta_1, \dots, \theta_k) and

B=[v1v0vkv0]Rn×kB = \begin{bmatrix}v_1 - v_0 & \cdots & v_k - v_0\end{bmatrix} \in \R^{n \times k}

we can say that xCx\in C iff

x=v0+Byx = v_0 + By
💡
It is just a matrix representation of xx

Since the columns in BB are linearly independent, the rank of BB is kk. Therefore, there exists a nonsingular matrix A=[A1A2]Rn×nA = \begin{bmatrix}A_1 \\ A_2\end{bmatrix} \in \R^{n\times n} such that

AB=[A1A2]B=[I0]AB = \begin{bmatrix}A_1\\A_2\end{bmatrix}B = \begin{bmatrix}I \\0 \end{bmatrix}
💡
In mathematical perspective, we can view AA as a transition matrix that we use in Gauss elimination process.

Therefore,

Ax=Av0+AByA1x=A1v0+y,  A2x=A2v0Ax = Av_0 + ABy \\ \Rightarrow A_1x = A_1 v_0 + y, \; A_2x = A_2v_0

From this we can see that xCx\in C if and only if

A2x=A2v0y=A1xA1v+0y01Ty1A_2 x = A_2v_0 \\ y = A_1x - A_1v+0 \\ y\succeq 0 \\ 1^Ty \le 1

In other words, we have xCx\in C  iff

A2x=A2v0,  A1xA1v0,  1TA1x1+1TA1v0A_2x = A_2v_0, \; A_1x \succeq A_1v_0, \; 1^TA_1x \le 1 + 1^TA_1v_0

which is a set of linear equalities and inequalities in xx. Therefore, we can express a simplex as a polyhedron.

💡
Re-check the definition of simplex, it is a function with respect to θ\theta not xx. By using above reformulation, we can expressed the simplex as a function of xx.

Positive semidefinite cone

We use the notation SnS^n to denote the set of symmetric n×nn\times n matrices

Sn={XRn×n  X=XT}S^n = \{X \in \R^{n\times n} |\; X = X^T\}

Note that SnS^n is a convex set.

  • Proof

    Let X,YSn,θ0X, Y \in S^n, \theta\ge 0

    θX+(1θ)Y\theta X + (1 - \theta)Y is also symmetric.

    So, θX+(1θ)YSn\theta X + (1 - \theta)Y\in S^n

    Therefore, SnS^n is a convex set.

We use the notation S+nS_+^n to denote the set of symmetric positive semi-definite matrices

S+n={XSn  X0}S_+^n = \{X\in S^n | \;X\succeq0\}

In mathematics, these are equivalent

XS+nzTXz0,zX\in S_+^n \Leftrightarrow z^TXz \ge 0, \forall z

Note that S+nS_+^n is a convex cone.

  • Proof

    Let X,YS+nX, Y \in S_+^n, α,β0\alpha, \beta \ge 0

    For given z,

    zT(αX+βY)zz,(αX+βY)zz^T(\alpha X + \beta Y )z \Leftrightarrow \langle z, (\alpha X + \beta Y)z\rangle

    By using the properties of inner-product,

    z,(αX+βY)z=z,αXz+βYz=z,αXz+z,βYz=αz,Xz+βz,Yz0\begin{aligned}\langle z, (\alpha X + \beta Y)z\rangle &= \langle z, \alpha Xz + \beta Yz\rangle \\ &= \langle z, \alpha Xz\rangle + \langle z, \beta Yz\rangle \\ &= \alpha\langle z, Xz\rangle + \beta \langle z, Yz\rangle \\ &\ge 0\end{aligned}

    That means αX+βYS+n\alpha X + \beta Y \in S_+^n

    Therefore, S+nS_+^n is a convex cone.

The notation S++nS_{++}^n to denote the set of symmetric positive definite matrices

S++n={XSn  X0}S_{++}^n = \{X \in S^n|\; X \succ 0\}
Boundary of positive semidefinite cone

Operations that preserve convexity

List of preserving convexity

  1. Intersection
  1. Affine functions
    1. scaling and translation
    1. projection
    1. sum of two sets
    1. partial sum
  1. Cartesian product
  1. Perspective function
  1. Linear fractional function

Intersection

Convexity is preserved under intersection: if S1S_1 and S2S_2 are convex, then S1S2S_1\cap S_2 is convex.

  • Proof

    Let A,BA, B is a convex set and x1,x2AB,θ0x_1, x_2 \in A\cap B, \theta \ge 0

    Since x1,x2Ax_1, x_2 \in A

    θx1+(1θ)x2A\theta x_1 + (1 - \theta)x_2 \in A

    For the similar reason,

    θx1+(1θ)x2B\theta x_1 + (1 - \theta)x_2 \in B

    So, θx1+(1θ)x2AB\theta x_1 + (1 - \theta)x_2 \in A\cap B

    Therefore, the intersection of a convex sets is also a convex set.

This property extends to the intersection of an infinite number of sets. If SαS_\alpha  is convex for every αA\alpha \in \mathcal A, then αASα\cap_{\alpha \in \mathcal A} S_\alpha is convex.

💡
We still can argue above even though the number of given sets is uncountable.

Affine functions

A function f:RnRmf : R^n \to R^m is affine if it is a sum of a linear function and a constant, i.e. if it has the form

f(x)=Ax+bf(x) = Ax +b

where ARm×n,bRmA \in \R^{m \times n}, b \in \R^m

Suppose SRnS\sub \R^n is convex and f:RnRmf : R^n \to R^m is an affine function. Then the image of SS  under ff,

f(S)={f(x)  xS}f(S) = \{f(x) | \; x\in S\}

is convex.

💡
The graph of an affine function is an affine set
  • Proof

    Let y1,y2f(S)y_1, y_2\in f(S), θ0\theta \ge 0. Then there exists x1,x2x_1, x_2 such that

    f(x1)=y1,f(x2)=y2f(x_1) = y_1, \quad f(x_2) = y_2
    θy1+(1θ)y2=θf(x1)+(1θ)f(x2)=f(θx1+(1θ)x2)\begin{aligned}\theta y_1 + (1 - \theta)y_2 &= \theta f(x_1) + (1 - \theta) f(x_2) \\ &= f(\theta x_1 + (1-\theta)x_2)\end{aligned}

    Since SS is a convex set, θx1+(1θ)x2S\theta x_1 + (1 - \theta) x_2 \in S. So, θy1+(1θ)y2f(S)\theta y_1 + (1 - \theta) y_2 \in f(S).

    Therefore, f(S)f(S) is a convex set.

Similarly, if f:RkRnf: \R^k \to \R^n is an affine function, the inverse image of SS under f,f,

f1(S)={x  f(x)S}f^{-1}(S) = \{x |\; f(x) \in S\}

is convex.

  • Proof

    Let x1,x2f1(S),θ0x_1, x_2\in f^{-1}(S), \theta \ge 0. Then there exists y1,y2Sy_1, y_2 \in S such that

    f(x1)=y1,f(x2)=y2f(x_1) = y_1, \quad f(x_2) = y_2
    f(θx1+(1θ)x2)=θf(x1)+(1θ)f(x2)S\begin{aligned}f(\theta x_1 + (1 - \theta)x_2) &= \theta f(x_1) + (1-\theta) f(x_2) \\ &\in S\end{aligned}

    So, θx1+(1θ)x2f1(S)\theta x_1 + ( 1- \theta) x_2\in f^{-1}(S).

    Therefore, f1(S)f^{-1}(S) is a convex set.

Example : scaling and translation

If SRnS \subset \R^n is convex, αR\alpha \in \R and aRna\in \R^n, then the sets αS,S+a\alpha S, S + a are convex, where

αS={αx  xS},S+a={x+a  xS}\alpha S = \{\alpha x |\; x\in S\}, \quad S + a = \{x + a | \; x\in S\}
  • Proof

    If α=0\alpha = 0, αS\alpha S and S+aS + a are trivially convex sets. Therefore, without loss of generality, we may assume α>0\alpha > 0.

    Let x,yαS,θ0x, y\in \alpha S, \theta\ge 0. Then, x/α,y/αSx/\alpha , y/\alpha \in S

    Since SS is a convex set,

    θx/α+(1θ)y/αS\theta x/\alpha + (1 - \theta)y/\alpha\in S

    So, α(θx/α+(1θ)y/α)=θx+(1θ)yαS\alpha(\theta x/\alpha + (1 - \theta)y/\alpha) = \theta x + (1-\theta)y\in \alpha S.

    Therefore, αS\alpha S is a convex set.

    Similarly, x,yS+a,θ0x, y\in S + a, \theta\ge 0. Then xa,yaSx - a, y-a \in S

    Since SS is a convex set

    θ(xa)+(1θ)(ya)S\theta(x -a ) + (1 - \theta)(y - a)\in S

    So, θ(xa)+(1θ)(ya)+a=θx+(1θ)yS+a\theta(x -a ) + (1 - \theta)(y - a) + a = \theta x + (1 - \theta)y \in S + a

    Therefore, S+aS + a is a convex set.

Example : projection

The projection of a convex set onto some of its coordinates is convex. If SRm×RnS \sub \R^m \times \R^n is convex, then

T={x1Rm  (x1,x2)S for some x2Rn}T = \{x_1\in \R^m |\;(x_1, x_2)\in S\text{ for some }x_2\in \R^n\}

is convex.

  • Proof

    Let x1,x2T,θ0x_1, x_2 \in T, \theta \ge 0. Then there exists y1,y2Rny_1, y_2\in \R^n such that

    (x1,y1)S(x2,y2)S(x_1, y_1) \in S \quad (x_2, y_2) \in S

    Since, SS is a convex set

    θ(x1,y1)+(1θ)(x2,y2)=(θx1+(1θ)x2,θy1+(1θ)y2)\theta(x_1, y_1) + (1-\theta)(x_2, y_2) = (\theta x_1 + (1 - \theta)x_2, \theta y_1 + (1 - \theta)y_2)

    So, θx1+(1θ)x2T\theta x_1 + (1 - \theta)x_2\in T.

    Therefore, TT is a convex set.

Example : sum of two sets

The sum of two sets is defined as

S1+S2={x+y  xS1,yS2}S_1 + S_2 = \{x + y|\; x\in S_1, y\in S_2\}

If S1S_1  and S2S_2 are convex, then S1+S2S_1 + S_2 is convex.

  • Proof

    Let z1,z2S1+S2,θ0z_1, z_2\in S_1 + S_2, \theta \ge 0. Then there exists x1,x2S1,y1,y2S2x_1, x_2\in S_1, y_1, y_2\in S_2 such that

    z1=(x1,y1)z2=(x2,y2)z_1 = (x_1, y_1) \quad z_2 = (x_2, y_2)
    θz1+(1θ)z2=θ(x1,y1)+(1θ)(x2,y2)=(θx1,θy1)+((1θ)x2,(1θ)y2)=(θx1+(1θ)x2,θy1+(1θ)y2)\begin{aligned}\theta z_1 + (1 - \theta)z_2 &= \theta(x_1, y_1) + (1-\theta) (x_2, y_2) \\ &= (\theta x_1, \theta y_1) + ((1-\theta)x_2, (1-\theta)y_2) \\ &= (\theta x_1 + (1 -\theta) x_2, \theta y_1 + (1 -\theta) y_2)\end{aligned}

    Since θx1+(1θ)x2S1,θy1+(1θ)y2S2\theta x_1 + (1 -\theta) x_2 \in S_1, \theta y_1 + (1 -\theta) y_2 \in S_2 , θz1+(1θ)z2S1+S2\theta z_1 + (1-\theta)z_2\in S_1 + S_2.

    Therefore, S1+S2S_1 + S_2 is a convex set.

Example : partial sum

The partial sum of S1,S2Rn×RmS_1, S_2\in \R^n\times \R^m, defined as

S={(x,y1+y2)  (x,y1)S1,(x,y2)S2}S = \{(x, y_1+y_2) | \; (x, y_1)\in S_1, (x, y_2)\in S_2\}

where xRn,yiRmx\in \R^n, y_i\in\R^m 

  • Proof

    Let z1,z2S,θ0z_1, z_2 \in S, \theta \ge 0. Then there exists x1,x2Rn,y11,y12,y21,y22Rmx_1, x_2 \in \R^n, y_{11}, y_{12}, y_{21}, y_{22}\in \R^m such that

    z1=(x1,y11+y12)z2=(x2,y21+y22)z_1 = (x_1, y_{11} + y_{12}) \quad z_2 = (x_2, y_{21} + y_{22})
    θz1+(1θ)z2=θ(x1,y11+y12)+(1θ)(x2,y21+y22)=(θx1+(1θ)x2,θ(y11+y12)+(1θ)(y21+y22))\begin{aligned}\theta z_1 + (1 - \theta)z_2 &= \theta(x_1, y_{11} + y_{12}) + (1 -\theta) (x_2, y_{21} + y_{22}) \\ &= (\theta x_1 + (1-\theta) x_2, \theta(y_{11} + y_{12}) + (1-\theta)(y_{21} + y_{22}))\end{aligned}

    Since θx1+(1θ)x2S1,θ(y11+y12),(1θ)(y21+y22)S2,θz1+(1θ)z2S\theta x_1 + (1- \theta) x_2 \in S_1, \theta(y_{11} + y_{12}), (1-\theta)(y_{21} + y_{22}) \in S_2 , \theta z_1 + (1- \theta) z_2\in S.

    Therefore, SS is a convex set.

Cartesian product

The Cartesian product of two convex sets

S1×S2={(x1,x2)  x1S1,x2S2}S_1 \times S_2 =\{(x_1, x_2) |\; x_1\in S_1, x_2\in S_2\}

is a convex set.

  • Proof

    Let z1,z2S1×S2,θ0z_1, z_2\in S_1 \times S_2, \theta \ge 0. Then there exists x1,x2S1,y1,y2S2x_1, x_2\in S_1, y_1, y_2\in S_2 such that

    z1=(x1,y1)z2=(x2,y2)z_1 = (x_1, y_1) \quad z_2 = (x_2, y_2)
    θz1+(1θ)z2=θ(x1,y1)+(1θ)(x2,y2)=(θx1+(1θ)x2,θy1+(1θ)y2)\begin{aligned}\theta z_1 + (1 - \theta)z_2 &= \theta(x_1, y_1) + (1-\theta)(x_2, y_2) \\ &= (\theta x_1 + (1 -\theta) x_2, \theta y_1 + (1 -\theta) y_2) \end{aligned}

    Since θx1+(1θ)x2S1,  θy1+(1θ)y2S2\theta x_1 + (1-\theta) x_2 \in S_1, \; \theta y_1 + (1-\theta) y_2 \in S_2, θz1+(1θ)z2S1×S2\theta z_1 + (1 - \theta)z_2 \in S_1 \times S_2.

    Therefore, S1×S2S_1\times S_2 is a convex set.

Perspective function

Define the perspective function P:Rn+1RnP : \R^{n + 1}\to \R^n, with domain dom P=Rn×R++\text{dom }P = \R^n \times \R_{++}, as

P(z,t)=z/tP(z, t) = z/t

where R++\R_{++} denotes the set of positive numbers. The perspective function scales or normalizes vectors so the last component is one, and then drops the last component.

💡
Since we eliminate the case when t=0t = 0, we don’t have to worry about that.

If Cdom PC\sub \text{dom }P is convex, then its image

P(C)={P(x)  xC}P(C) = \{P(x) |\; x\in C\}

is convex.

  • Proof

    Let x,yP(C),θ0x, y\in P(C), \theta \ge 0. Then there exists z1,z2Rn,t1,t2Rz_1, z_2 \in \R^n, t_1, t_2\in \R such that

    x=P(z1,t1)=z1/t1y=P(z2,t2)=z2/t2θx+(1θ)y=θP(z1,t1)+(1θ)P(z2,t2)x = P(z_1, t_1) = z_1/t_1 \quad y = P(z_2, t_2)=z_2/t_2 \\ \begin{aligned}\theta x + (1 - \theta)y &= \theta P(z_1, t_1) + (1 - \theta)P(z_2, t_2) \end{aligned}

    We want to find μ\mu such that

    P(μ(z1,t1)+(1μ)(z2,t2))=θP(z1,t1)+(1θ)P(z2,t2)μz1+(1μ)z2/μt1+(1μ)t2=θz1/t1+(1θ)z2/t2μ=θt1θt1+(1θ)t2 P(\mu(z_1, t_1) + (1- \mu)(z_2, t_2)) = \theta P(z_1, t_1) + (1 - \theta)P(z_2, t_2) \\ \Rightarrow \mu z_1 + (1 - \mu)z_2 / \mu t_1 + (1 - \mu)t_2 = \theta z_1 / t_1 + (1 - \theta)z_2/t_2 \\ \Rightarrow \mu = \frac{\theta t_1}{\theta t_1 + (1-\theta)t_2}

    Since μ0\mu\ge 0 and CC is a convex set,

    μ(z1,t1)+(1μ)(z2,t2)CP(μ(z1,t1)+(1μ)(z2,t2))P(C)θP(z1,t1)+(1θ)P(z2,t2)P(C)θx+(1θ)yP(C)\begin{aligned}\mu(z_1, t_1) + (1 - \mu)(z_2, t_2)\in C &\Rightarrow P(\mu(z_1, t_1) + (1 - \mu)(z_2, t_2)) \in P(C) \\ &\Rightarrow \theta P(z_1, t_1) + (1- \theta) P(z_2, t_2)\in P(C) \\ & \Rightarrow \theta x + (1 - \theta)y \in P(C)\end{aligned}

    Therefore, P(C)P(C) is a convex set.

    We can interpret the relation between μ\mu and θ\theta like this. Since two triangulars are similar,

    μ:(1μ)=θt1:(1θ)t2\mu : (1-\mu) = \theta t_1 : (1-\theta)t_2

    By using this, we can easily derive

    μ=θt1θt1+(1θ)t2\mu = \frac{\theta t_1}{\theta t_1 + (1-\theta)t_2}

We can also show that the line segments are mapped to line segments under the perspective function. The result can be interpreted as a concept of a pin-hole camera.

The inverse image of a convex set under the perspective function is also convex.

If CRnC\sub \R^n is convex, then

P1(C)={(x,t)Rn+1  x/tC,t>0}P^{-1}(C) = \{(x, t) \in \R^{n + 1}|\; x/t\in C, t> 0\}

is convex.

  • Proof

    Let (x1,t1),(x2,t2)P1(C),θ0(x_1, t_1), (x_2, t_2) \in P^{-1}(C), \theta\ge 0.

    We want to show that θ(x1,t1)+(1θ)(x2,t2)P1(C)\theta(x_1, t_1) + (1-\theta)(x_2, t_2)\in P^{-1}(C), i.e.

    (θx1+(1θ)x2,θt1+(1θ)t2)P1(C)θx1+(1θ)x2θt1+(1θ)t2C(\theta x_1 + (1-\theta)x_2, \theta t_1 + (1 - \theta)t_2)\in P^{-1}(C) \\ \Leftrightarrow \frac{\theta x_1 + (1-\theta)x_2}{\theta t_1 + (1 - \theta)t_2} \in C

    We want to find μ0\mu \ge 0 such that

    μ(x1/t1)+(1μ)(x2/t2)=θx1+(1θ)x2θt1+(1θ)t2\mu(x_1/t_1) + (1 - \mu)(x_2/t_2) = \frac{\theta x_1 + (1-\theta)x_2}{\theta t_1 + (1 - \theta)t_2}

    Take

    μ=θt1θt1+(1θ)t2\mu = \frac{\theta t_1}{\theta t_1 + (1-\theta)t_2}

    Since x1/t1,x2/t2Cx_1/t_1, x_2/t_2\in C and CC is a convex set,

    μ(x1/t1)+(1μ)(x2/t2)Cθ(x1,t1)+(1θ)(x2,t2)P1(C)\mu(x_1/t_1) + (1 - \mu)(x_2/t_2)\in C \\ \Leftrightarrow \theta(x_1, t_1) + (1-\theta)(x_2, t_2)\in P^{-1}(C)

    Therefore, P1(C)P^{-1}(C) is a convex set.

Linear-fractional functions

A linear-fractional function is formed by composing the perspective function with an affine function. Suppose g:RnRm+1g: \R^n \to \R^{m + 1} is affine, i.e.

g(x)=[AcT]x+[bd]g(x) = \begin{bmatrix}A \\ c^T\end{bmatrix} x + \begin{bmatrix}b \\ d\end{bmatrix}

where ARm×n,bRm,cRnA\in \R^{m \times n}, b\in \R^m, c\in \R^n, and dRd\in \R. The function f:RnRmf : \R^n\to \R^m given by f=Pgf = P\circ g such that

f(X)=Ax+bcTx+df(X) = \frac{Ax + b}{ c^Tx + d}

where dom f={x  cTx+d>0}\text{dom f} = \{x|\; c^Tx + d > 0\}

is called a linear-functional (or projective) function. If c=0c = 0 and d>0d > 0, the domain of ff is Rn\R^n, and ff is an affine function.

💡
Images and inverse images of convex sets under linear-fractional functions is a convex set.

Example

f(x)=1x1+x2+1xf(x) = \frac{1}{x_1 + x_2 + 1}x

Generalized inequalities

Proper cones and generalized inequalities

A cone KRnK\sub \R^n is called a proper cone if it satisfies the following

  1. KK is convex
  1. KK is closed
  1. KK is solid (it has nonempty interior)
  1. KK is pointed, which means that it contains no line (or equivalently, xK,xKx=0x\in K, -x\in K \Rightarrow x = 0)

Examples

  1. nonnegative orthant K=R+n={xRn  xi0,i=1,,n}K = \R_+^n = \{x\in \R^n|\; x_i \ge 0, i = 1, \dots, n\}
  1. positive semidefinite cone K=S+nK = S_+^n
  1. nonnegative polynomials on [0,1][0, 1]
    K={xRn  x1+x2t+x3t2++xntn10 for t[0,1]}K = \{x \in \R^n|\; x_1 + x_2 t+ x_3 t^2 + \cdots + x_n t^{n - 1}\ge 0 \text{ for } t\in [0, 1]\}

A proper cone KK can be used to define generalized inequality, which is a partial ordering on Rn\R^n that has may of the properties of the standard ordering on R\R defined by

xKyyxKxKyyxint Kx \preceq_K y \Leftrightarrow y - x \in K \\ x \prec_K y \Leftrightarrow y - x \in \text{int }K

Actually, we have to check whether the above definition satisfy the following conditions

For all a,b,cRna, b, c \in \R^n,

Partial orders

  1. Reflexivity : aaa\le a (every element is related to itself)
  1. Antisymmetry : if aba\le b and bab \le a then a=ba = b (no two distinct elements precede each other)
  1. Transitivity : if aba\le b and bcb \le c then aca \le c

Strict partial orders

  1. Ir-reflexivity : not a<aa < a (no element is related to itself)
  1. Asymmetry : if a<ba < b then not b<ab < a
  1. Transitivity : if aba\le b and bcb \le c then aca \le c

Examples

  1. component-wise inequality (K=R+n)K = \R_+^n)
    xR+nyxiyi,i=1,,nx\preceq_{\R_+^n}y \Leftrightarrow x_i \le y_i, i = 1, \dots, n
  1. matrix inequality (K=S+nK = S_{+}^n)
    XS+nYYX positive semi-definiteX\preceq_{S_+^n}Y \Leftrightarrow Y - X \text{ positive semi-definite}
💡
these two types are so common that we normally drop the subscript in K\preceq_K
💡
These are not total ordering (i.e. there are some elements that are not comparable by using this order relation)

Properties

many properties of K\preceq_K are similar to \le  on R\R

  1. K\preceq_K is preserved under addition
    xKy,uKvx+uKy+vx\preceq_K y, u\preceq_K v \Rightarrow x+u \preceq_K y+ v
    • Proof

      yxKy - x\in K and vuKv - u\in K. Since KK  is a proper cone,

      (yx)+(vu)Ky+v(x+u)K (associative in Rn)x+uKy+v(y - x) + (v - u)\in K \\ \Rightarrow y + v - (x + u) \in K \text{ (associative in }\R^n) \\ \Rightarrow x + u\preceq_Ky + v
  1. K\preceq_K is preserved under non-negative scaling
    xKy,α0αxKαyx\preceq_K y, \alpha \ge 0 \Rightarrow \alpha x \preceq_K \alpha y
    • Proof

      yxKy - x \in K. Since α>0\alpha > 0 α>0\alpha > 0KK is a proper cone,

      α(yx)KαyαxKαxKαy\alpha(y - x) \in K \\ \Rightarrow \alpha y - \alpha x \in K \\ \Rightarrow \alpha x \preceq_K \alpha y
  1. K\preceq_K is preserved under limits

    If xiKyix_i\preceq_K y_i for i=1,2,i = 1, 2, \dots and xixx_i \to x and yiyy_i \to y as ii \to \infty, then xKyx\preceq_K y

    • Proof

      yixiK,iy_i - x_i \in K, \forall i and yixiyxy_i - x_i \to y - x  as ii \to \infty

      Since KK is closed (it contains all of its limit points),

      yxKy- x\in K

      Therefore, xKyx\preceq_Ky

Minimum and minimal elements

A point xSx\in S is the minimum element of SS with respect to K\preceq_K if

yS,xKy\forall y\in S, x\preceq_K y

If a set has a minimum element, then it is unique.

  • Proof

    Let xx and yy are minimum element of SS. Then,

    zS,zKxzS,zKy\forall z\in S, z\preceq_K x \\ \forall z\in S, z \preceq_K y

    Therefore,

    yKx and xKyy \preceq_K x \text{ and } x\preceq_Ky

    By anti-symmetry,

    x=yx = y

    Therefore, it is unique.

A point xSx\in S is the minimal element of SS, if

yS,yKx and xy\nexists y\in S, y \preceq_K x \text{ and } x\ne y
💡
If a set XX has a minimum element, then all elements in XX are comparable with respect to K\preceq_K.

Example

When K=R+2K = \R_+^2,

x1x_1 is the minimum element of S1S_1

x2x_2 is a minimal element of S2S_2 since the yellow regions are not comparable to x2x_2.

Separating hyperplane

If CC and DD are disjoint convex sets, then there exists a0,ba\ne 0, b such that

xC,aTxbxD,aTxb\forall x\in C, a^Tx \le b \\ \forall x\in D, a^Tx \ge b

the hyperplane {x  aTx=b}\{x |\; a^Tx = b\} separates CC and DD

strict separation requires additional assumptions (e.g. CC is closed, DD is a singleton)

Supporting hyperplane theorem

supporting hyperplane to set CC at boundary point x0x_0:

{x  aTx=aTx0}\{x |\; a^Tx = a^T x_0\}

where a0,xC,aTxaTx0a\ne 0, \forall x\in C, a^Tx \le a^Tx_0

supporting hyperplane theorem : if CC is convex, then there exists a supporting hyperplane at every boundary point of CC

💡
Note that the normal vector aa directs to the opposite side of the CC

Dual cones and generalized inequalities

Let KK be a cone. The set

K={y  yTx0,xK}K^* = \{y | \; y^Tx \ge 0,\forall x\in K\}

is called the dual cone of KK. As the name suggests, KK^* is a cone, and is always convex, even when the original cone KK is not.

  • Proof

    Let y1,y2K,θ0y_1, y_2\in K^*, \theta \ge 0.

    Therefore, y1Tx0y_1^T x\ge 0 and y2Tx0y_2^Tx \ge 0 for all xKx\in K

    We want to show that θy1+(1θ)y2K\theta y_1 + (1 - \theta)y_2\in K^* i.e.

    (θy1+(1θ)y2)Tx0,xKθy1Tx+(1θ)y2Tx0,xK(\theta y_1 + (1 - \theta) y_2)^Tx \ge 0, \forall x\in K \\ \Leftrightarrow \theta y_1^Tx + (1-\theta)y_2^Tx \ge 0, \forall x\in K

    Since y1Tx0y_1^T x\ge 0 and y2Tx0y_2^Tx \ge 0 for all xKx\in K,

    θy1Tx+(1θ)y2Tx0\theta y_1^Tx + (1 - \theta)y_2^Tx \ge 0

    So, θy1+(1θ)y2K\theta y_1 + (1 - \theta)y_2\in K^*

    Therefore, KK^* is a convex set.

    💡
    There is no requirement of convexity of KK

Geometrically, yKy\in K^* if and only if y-y is the normal of a hyperplane that supports KK at the origin.

Examples

  1. K=R+n,K=R+nK = \R_+^n, K^* = \R_+^n
  1. Dual of a norm cone. Let \|\cdot\| be a norm on Rn\R^n. The dual of the associated cone K={(x,t)Rn+1  xt}K = \{(x, t)\in \R^{n + 1}|\; \|x\|\le t\} is the cone defined by the dual norm
    K={(u,v)Rn+1  uv}K^* = \{(u, v)\in \R^{n + 1}|\; \|u\|_* \le v\}

    where the dual norm is given by u=sup{uTx  x1}\|u\|_* = \sup \{u^Tx |\; \|x\|\le 1\}

    • Proof

      We want to show that

      xK,xTu+tv0uvxtxTu+tv0uv\forall x\in K, x^Tu + tv \ge 0 \Leftrightarrow \|u\|_*\le v \\ \Rightarrow \|x\|\le t \Rightarrow x^Tu + tv \ge 0 \Leftrightarrow \|u\|_* \le v

      TBC

    Note that

    sup{uTx  x21}=u2sup{uTx  x11}=usup{uTx  x1}=u1\sup\{u^Tx |\; \|x\|_2\le 1\} = \|u\|_2 \\ \sup\{u^Tx |\; \|x\|_1\le 1\} = \|u\|_\infty \\ \\ \sup\{u^Tx |\; \|x\|_\infty\le 1\} = \|u\|_1

Properties

  1. KK^* is closed and convex
  1. K1K2K2K1K_1\sub K_2 \Rightarrow K_2^* \sub K_1^*
  1. If KK has non-empty interior, then KK^* is pointed
  1. If the closure of KK is pointed then KK^* has non-empty interior.
  1. KK^{**} is the closure of the convex hull of KK (Hence if KK  is convex and closed, K=KK^{**} = K)
💡
These properties show that if KK is a proper cone, then so is its dual KK^*, and moreover, that K=KK^{**} = K

Dual generalized inequalities

Suppose that the convex cone KK is proper, it induces a generalized inequality K\preceq_K. Then its dual cone KK^* is also proper, and therefore, induces a generalized inequality.

yK0yTx0,xK0y\succeq_{K^*}0 \Leftrightarrow y^Tx\ge 0, \forall x\succeq_K 0

Example : positive semi-definite cone

On the set of symmetric n×nn\times n matrices SnS^n, we use the standard inner proudct tr(XY)=i,jXijYij\text{tr}(XY) = \sum_{i, j}X_{ij}Y_{ij}. The positive semi-definite cone S+nS_{+}^n is self-dual. i.e. for X,YSnX, Y\in S^n

tr(XY)0,X0Y0\text{tr}(XY)\ge 0, \forall X\succeq 0 \Leftrightarrow Y\succeq 0
  • Proof

    Suppose YS+nY\notin S_{+}^n. Then there exists yRny\in \R^n with

    qTYq=tr(qqTY)<0q^TYq = \text{tr}(qq^TY) < 0

    Since X=qqTX= qq^T is a positive semi-definite matrix, Y(S+n)Y\notin (S_{+}^n)^*

    Now suppose X,YS+nX, Y\in S_+^n. We can express XX as a linear combination of rank-one matrix by using eigenvalue decomposition.

    X=i=1nλiqiqiTX = \sum_{i = 1}^n \lambda_i q_iq_i^T

    Then,

    tr(YX)=tr(Yi=1nλiqiqiT)=i=1nλiqiTYqi0\begin{aligned}\text{tr}(YX) &= \text{tr}(Y\sum_{i = 1}^n \lambda_iq_iq_i^T) \\ &= \sum_{i = 1}^n\lambda_iq_i^TYq_i \\ & \ge 0\end{aligned}

    Therefore, Y(S+n)Y \in (S_+^n)^*

Minimum and minimal elements via dual inequalities

We can use dual generalized inequalities to characterize minimum and minimal elements of a set SRnS\sub \R^n with respect to the generalized inequality induced by a proper cone KK

Minimum element w.r.t K\preceq_K

xx is a minimum element of SS if and only if for all λK0\lambda\succ_{K^{*}}0, xx is the unique minimizer of λTz\lambda^Tz over zSz\in S. Geometrically, this means that for any λK0\lambda \succ_{K^*}0, the hyperplane

{zλT(zx)=0}\{z|\lambda^T(z - x) = 0\}

is a strict supporting hyperplane to SS at xx. Equivalently, this means that the hyperplane intersects SS only at the point xx

💡
We can think the level surface which has a normal vector as a λ\lambda

Minimal element w.r.t K\preceq_K

If xx minimizes λTz\lambda^Tz over SS for some λK0\lambda \succ_{K^*}0, then xx is minimal

💡
We only consider the specific λK\lambda\in K^*. It basically related to the supporting hyperplane.

If xx is a minimal element of a convex set SS, then there exists a nonzero λK0\lambda\succeq_{K^*}0 such that xx minimizes λTz\lambda^Tz over SS.

💡
Justification of finding the supporting hyperplane for all point in the convex set.

If SS is not a convex, we can’t guarantee that there exists such λ\lambda

In this case, xx is a minimal point of SS with respect to R+2\R_{+}^2. However, there doesn’t exist λ\lambda for which xx minimizes λTz\lambda^Tz over zSz\in S

반응형
Contents

포스팅 주소를 복사했습니다

이 글이 도움이 되었다면 공감 부탁드립니다.