Computer Science/Optimization

2. Optimality Conditions, Convexity, Newton’s Method for Equations

  • -
728x90
반응형

Optimality Conditions: Preliminaries

  • Local minima and maxima have one thing in common
    f(x)=0\nabla f(x) = 0

First-order Necessary Condition

From now on ff is differentiable and that its first and second derivatives are continuous for every xXx\in X where XX is a domain of ff

If xx^* is a local minimum, then

f(x)=0\nabla f(x^*) = 0
  • Not a sufficient condition, since it could be a saddle point

Second-order Conditions

If xx^* is a local minimum, then 2f(x)\nabla^2 f(x^*) is positive semi-definite

  • Proof

But you should note that even though f(x)=0\nabla f(x^*) = 0 and 2f(x)0\nabla^2f(x^*) \ge 0, we can’t be sure that xx^* must be a local minimum.

Think about,

f(x1,x2)=x13+x23f(x_1, x_2) =x_1^3 + x_2^3

That means there is no sufficient and necessary condition in general

💡
First-order Condition과 Second-order Condition은 local minimum이면 만족하지만 그것을 만족한다고 해서 local minimum이라고 확신할 수 없다는 것이다.

Second-order sufficient condition

It is sufficient to guarantee that xx^* is a local minimizer. If

f(x)=0 and 2f(x) is positive definite\nabla f(x^*) = 0 \text{ and } \nabla^2 f(x^*) \text{ is positive definite}

then xx^* is a strict local minimizer of ff.

  • Proof

    Since f(x)=0,\nabla f(x^*) = 0,

    f(x+p)=f(x)+12vT2f(ζ)vf(x^* + p) = f(x^*) + \frac{1}{2}v^T\nabla^2f(\zeta)v

    where ζNp(x)\zeta \in N_p(x^*)

    Since 2f\nabla^2f is continuous, δ\exists \delta such that

    ζx<δ2f(ζ)2f(x)<12vT2f(x)v\|\zeta - x^*\| < \delta \Rightarrow \|\nabla^2f(\zeta) - \nabla^2f(x^*)\| < \frac{1}{2}v^T\nabla^2f(x^*)v

    Take v=min(1,δ)v = \min(1, \delta)

    💡
    Note that ζNv(x)\zeta \in N_v(x^*)

    Moreover, since 2f(x)\nabla^2 f(x^*) is positive definite,

    vT2f(x)v>0,v(v0)v^T\nabla^2f(x^*)v > 0, \forall \text v{ (}v\ne 0)
    vT2f(x)vvT2f(ζ)v=vT(2f(x)2f(ζ))vv(2f(x)2f(ζ))vv2f(x)2f(ζ)vv22f(x)2f(ζ)12vT2f(x)v\begin{align}v^T\nabla^2f(x^*)v - v^T\nabla^2f(\zeta)v &= v^T(\nabla^2f(x^*) - \nabla^2f(\zeta) )v \\ & \le \|v\|\|(\nabla^2f(x^*) - \nabla^2f(\zeta) )v\| \\ & \le \|v\|\|\nabla^2f(x^*) - \nabla^2f(\zeta)\|\|v\| \\ & \le \|v\|^2\|\nabla^2f(x^*) - \nabla^2f(\zeta)\| \\ & \le \frac{1}{2}v^T\nabla^2f(x^*)v\end{align}
    💡
    1) bi-linearity of inner-product

    2) Cauchy-schwarz inequality

    3) defi

    4) Associativity of multiplication in R\R

    Therefore,

    0<12vTf(x)vvT2f(ζ)v0 < \frac{1}{2}v^T\nabla f(x^*)v \le v^T\nabla^2f(\zeta)v

    This means that 2f(ζ)\nabla^2f(\zeta) is also positive definite. So, xx^* is a local minimum.

    💡
    Since vv has a restriction on its domain, we can not guarantee that xx^* is a global minimum.

Convexity

  • A set SS is convex if any xSx\in S and ySy \in S,
αx+(1α)yS,α[0,1]\alpha x + (1-\alpha)y \in S, \forall \alpha \in[0, 1]
  • A function ff is convex on a convex set SS if for any xSx\in S and ySy \in S,
    f(αx+(1α)y)αf(x)+(1α)f(y),α[0,1]f(\alpha x+ (1-\alpha)y) \le \alpha f(x) + (1-\alpha)f(y), \forall \alpha\in [0, 1]
  • A minimization problem is convex if ff is convex and SS is convex

Implication of Convexity

For a convex optimization problem, a local optimum xx^* is also a global optimum

If the feasible region is convex, it implies that all the points along the line segment between the current solution and the new feasible solution are also feasible.

Therefore, any smaller step size will still allow us to move along the same direction from the current solution, while remaining within the feasible region. This is because the convexity property guarantees that the entire line segment between the current solution and the new feasible solution lies within the feasible region.

This property is valuable in optimization algorithms, as it ensures that if we have found a feasible solution, we can always refine it further by taking smaller steps while maintaining feasibility. It provides flexibility in adjusting the step size and allows for a more precise exploration of the feasible region.

How to recognize convexity?

  1. One dimension case : A function ff is convex if and only if f(x)0f''(x) \ge 0 for all xXx\in X where XX is a domain of ff
  1. Multiple dimension case
    • ff is convex if and only if Hessian 2f(x)0\nabla^2 f(x) \ge 0 for all xXx \in X where XX is a domain of ff
    • If 2f(x)\nabla^2 f(x) is positive definite for all xXx\in X (i.e. 2f(x)>0,xX)\nabla^2 f(x) > 0, \forall x \in X), then ff is strictly convex

      → Note that it is not true for opposite direction

    • ff is convex if and only if it is above the tangent
      f(y)f(x)+f(x)(yx),x,yXf(y) \ge f(x) + \nabla f(x)(y - x), \forall x, y\in X
💡
With constraints, we care more about function’s convexity over the feasible set; the above remains applicable but may require minor rephrasing

Convexity and Optimality Condition

Actually if ff is convex, local optimum and global optimum is equivalent. In addition,

f(x)=0x is global minimum\nabla f(x^*) = 0 \Leftrightarrow x^*\text{ is global minimum}
💡
Second-order Condition is already satisfy if ff is convex

Newton’s Method for Nonlinear Equations

  • A sequence of points as approximate solutions
  • Convergent or not: depends on the function and the initial point

Newton’s Method: Multiple Dimensions

Let g:RnRng : \R^n \to \R^n

g(xk+p)=g(xk)+g(xk)pg(x_k+ p) = g(x_k) + \nabla g(x_k)p

That means

g(xk)+g(xk)p=0p=g(xk)1g(xk)g(x_k) + \nabla g(x_k)p = 0 \Leftrightarrow p = -\nabla g(x_k)^{-1}g(x_k)

if g(xk)\nabla g(x_k) is invertible

Therefore,

xk+1=xkg(xk)1g(xk)x_{k + 1} = x_k - \nabla g(x_k)^{-1}g(x_k)
💡
If g(xk)\nabla g(x_k) is not invertible, this method can’t be applicable.
💡
If we interpret g(xk)1\nabla g(x_k)^{-1} as a 1g\frac{1}{g'}, it is just a generalization of the newton’s method in 1-dimensional case.
반응형
Contents

포스팅 주소를 복사했습니다

이 글이 도움이 되었다면 공감 부탁드립니다.