The basic idea in Lagrangian duality is to take the constraints into account by augmenting the objective function with a weighted sum of the constraint functions.
Standard form problem (not necessary convex)
subject to
where x∈Rn, domain D, optimal value p∗
Lagrangian
where L:Rn×Rm×Rp→R with dom L=D×Rm×Rp
Lagrange dual function
Lagrange dual function
where g:Rm×Rp→R
Since the dual function is the point-wise infimum of a family of affine functions of (λ,μ), it is a concave function. This is because we can view this problem as a point-wise supremum problem by adding a negative term to our objective function. Even though we add a negative term, it is still affine on (λ,μ) (i.e it satisfies the convexity on (λ,μ) for fixed x)
Proof
We already know that convexity preserving operations.
If f(x,y) is convex in x for every y∈A, then
is convex
Since L(x,λ,ν) is concave in (λ,ν) for every x∈D, then
is concave.
lower bound property
If λ⪰0, then g(λ,ν)≤p∗
Proof
If xˉ is feasible and λ⪰0, then
minimizing over all feasible xˉ gives p∗≥g(λ,ν)
Example
Least-norm solution of linear equations
subject to
Dual function
Lagrangian is L(x,ν)=xTx+νT(Ax−b)
to minimize L over x, set gradient equal to zero
Therefore,
plug in to L to obtain g
a concave function of ν
lower bound property
for all ν
Standard form LP
subject to
The Lagrangian is
since it is affine in x,
Therefore p∗≥−bTν if ATν+c⪰0
Equality constrained norm minimization
subject to
The dual function is
where ∥v∥∗=sup∥u∥≤1uTv is dual norm of ∥⋅∥
Therefore p∗≥bTν if ∥ATν∥∗≤1
Two-way partitioning
subject to
We can express the objective as
Therefore, we can interpret Wij as a measure of hostility since if Wij is large, we have to allocate the opposite sign to xi and xj respectively.
The dual function is
Therefore p∗≥−1Tν if W+diag(v)⪰0
Lagrange dual and conjugate function
subject to
The dual function is
where f∗(y)=supx∈dom f(yTx−f(x))
Example : entropy maximization
The Lagrange dual problem
For each pair (λ,ν) with λ⪰0, the Lagrange dual function gives us a lower bound on the optimal value p∗ of the optimization problem. Thus we have a lower bound that depends on some parameters λ,ν.
The natural question is: What is the best lower bound that can be obtained from the Lagrange dual function?
This leads to the optimization problem
subject to
This problem is called the Lagrange dual problem. The term dual feasible, to describe a pair (λ,ν) with λ⪰0 and g(λ,ν)>−∞.
Example
standard form LP and its dual
standard form
subject to
dual problem
subject to
Weak and strong duality
weak duality : d∗≤p∗ (where d∗ is the optimal value of the dual problem)
it always hold for convex and non-convex problems.
it can be used to find non-trivial lower bounds for difficult problem
strong duality : d∗=p∗
it does not hold in general
(usually) holds for convex problems
conditions that guarantee strong duality in convex problems are called constraint qualifications
Slater’s constraint qualification
One simple constraint qualification is Slater's condition
Let our problem is as follows
subject to
where f0,…,fm is a convex function.
If there exists an x∈relint D such that
Such a point is sometimes called strictly feasible
Slater’s theorem states that strong duality holds if Slater’s condition holds and the problem is convex.
Example
Inequality from LP
Primal problem
subject to
dual function
dual problem
subject to
From the Slater’s condition, p∗=d∗ if Axˉ≺b for some xˉ.
In fact, p∗=d∗ except when primal and dual are infeasible.
Quadratic program
Primal problem (assume P∈S++n)
subject to
dual function
dual problem
subject to
From the Slater’s condition, p∗=d∗ if Axˉ≺b for some xˉ
Geometric interpretation
for simplicity, consider problem with one constraint f1(x)≤0
where G={(f1(x),f0(x))∣x∈D}
λu+t=g(λ) is supporting hyperplane to G
hyperplane intersects t-axis at t=g(λ)
Complementary slackness
Suppose that the primal and dual optimal values are attained and equal. Let x∗ be a primal optimal and (λ∗,ν∗) is dual optimal
Therefore, the two inequalities holds with equality.
So, x∗ minimizes L(x,λ∗,ν∗) and
More specifically,
It is known as complementary slackness
Karush-Kuhn-Tucker (KKT) conditions
We now assume that the functions fi,hi are differentiable and therefore have open domains.
Let x∗ and (λ∗,ν∗) be any primal and dual optimal points with zero duality gap. Since x∗ minimizes L(x,λ∗,ν∗) over x, it follows that its gradient must vanish at x∗
Thus the KKT conditions is as follows
primal constraints
dual constraints
complementary slackness
stationary
if strong duality holds and x∗,λ∗,ν∗ are optimal, then they must satisfy the KKT conditions.
KKT conditions for convex problems
When the primal problem is convex, the KKT conditions are also sufficient for the points to be primal and dual optimal. In other words, if fi are convex and hi are affine, and x~,λ~,ν~ are any points that satisfy the KKT conditions, then x~ and (λ~,ν~) are primal and dual optimal, with zero duality gap.
Since λ~i≥0, L(x,λ~,ν~) is convex in x. Moreover, from the complementary slackness condition, x~ minimizes L(x,λ~,ν~) over x.
Therefore,
This shows that x~ and (λ~,ν~) have zero duality gap, and therefore are primal and dual optimal.
KKT conditions for convex problems with Slater’s condition
If a convex optimization problem with differentiable objective and constraint functions satisfies Slater’s condition, then the KKT conditions provide necessary and sufficient conditions for optimality. Since Slater’s condition implies that the optimal duality gap is zero and the dual optimum is attained, so if x is optimal we can always find (λ,ν) that satisfy the KKT conditions.
Example : Water-filling
Assume αi>0 (αi is a given value)
subject to
if x is optimal, there exist λ∈Rn,ν∈R such that
primal feasibility
dual feasibility
complementary slackness
stationary condition
Therefore,
By using these conditions,
if ν<1/αi
if ν≥1/αi
Therefore, we can express x as a
Therefore, we can determine ν.
Perturbation and sensitivity analysis
When strong duality holds, the optimal dual variables give very useful information about the sensitivity of the optimal value with respect to perturbations of the constraints.
Unperturbed optimization problem and its dual
Primal problem
subject to
Dual problem
subject to
Perturbed problem and its dual
Primal problem
subject to
Dual problem
such that
where x is a primal variable, u,v are parameters
We define p∗(u,v) as the optimal value of the perturbed problem
When the original problem is convex, the function p∗ is a convex function of u and v
Global sensitivity result
Assume that strong duality holds and that the dual optimum is attained. (This is the case if the original problem is convex, and Slater’s condition is satisfied)
Let (λ∗,ν∗) be optimal for the dual of the unperturbed problem. Then for all u and v we have
We conclude that for any x feasible for the perturbed problem, we have
Sensitivity interpretation
If λi∗ is large and we tighten ith constraint (i.e. choose ui<0), then the optimal value p∗(u,v) is guaranteed to increase greatly
If νi∗ is large and positive and we take vi<0, or if νi∗ is large and negative and we take vi>0, then the optimal value p∗(u,v) is guaranteed to increase greatly
If λi∗ is small, and we loosen the ith constraint (ui>0), then the optimal value p∗(u,v) will not decrease too much
If νi∗ is small and positive, and vi>0, or if νi∗ is small and negative and vi<0, then the optimal value p∗(u,v) will not decrease too much.
Local sensitivity
Suppose now that p∗(u,v) is differentiable at u=0,v=0. Then, provided strong duality holds, the optimal dual variables λ∗,ν∗ are related to the gradient of p∗ at u=0,v=0
Thus, when p∗(u,v) is differentiable at u=0,v=0, and strong duality holds, the optimal Lagrange multipliers are exactly the local sensitivities of the optimal value with respect to constraint perturbations.
Proof
Since strong duality holds and the dual optimal (λ∗,ν∗) is attained,
for all u,v
Take u=tei and v=0, then
If t≥0,
Similarly, if t≤0,
Therefore,
The same method can be used to establish
The local sensitivity result gives us a quantitative measure of how active a constraint is at the optimum x∗. If fi(x∗)<0, then the constraint is inactive, and it follows that the constraint can be tightened or loosened a small amount without affecting the optimal value since by complementary slackness, the associated optimal Lagrange multiplier must be zero.
But now suppose that fi(x∗)=0. The ith optimal Lagrange multiplier tells us how active the constraint is.
If λi∗ is small : the constraint can be loosened or tightened a bit without much effect on the optimal value
if λi∗ is large : if the constraint is loosened or tightened a bit, the effect on the optimal value will be great
Duality and problem reformulations
Equivalent formulations of a problem can lead to very different duals. Reformulating the primal problem can be useful when the dual is difficult to derive, or uninteresting.
Common reformulations
introduce new variables and equality constraints
make explicit constraints implicit or vice-versa
transform objective or constraint functions
→ replace f0(x) by ϕ(f0(x)) with ϕ convex, increasing
Introducing new variables and equality constraints
Consider an unconstrained problem of the form
The dual function is constant
We have strong duality, but dual is quite useless.
Now let’s reformulate the above problem as
subject to
The dual function of the transformed problem is
Implicit constraints
Include some of the constraints in the objective function by modifying the objective function to be infinite when the constraint is violated
We consider the linear program
subject to
where A∈Rp×n and l≺u.
The dual problem is
subject to
Reformulate the primal problem as
Then the dual function for the reformulated problem is
where yi+=max{yi,0},yi−=max{−yi,0}
Therefore the dual problem is the unconstrained problem
Generalized inequalities
We can extend the Lagrange duality to a problem with generalized inequality constraints.
subject to
where Ki⊂Rki are proper cones.
The Lagrange dual
With each generalized inequality fi(x)⪯Ki0, we associate a Lagrange multiplier vectorλi∈Rki and define the associated Lagrangian as
The dual function is defined exactly as in a problem with scalar inequalities
Since the Lagrangian is affine in the dual variables (λ,ν), and the dual function is a point-wise infimum of the Lagrangian, the dual function is concave.
As in a problem with scalar inequalities, the dual function gives lower bounds on p∗. Similar to λi≥0 the scalar case, it requires the non-negativity requirement on the dual variables for the generalized inequality case.
where Ki∗ denotes the dual cone of Ki.
Weak duality follows immediately from the definition of dual cone because if λi⪰Ki∗0 and fi(x~)⪯Ki0, then λiTfi(x~)≤0. Therefore for any primal feasible point x~ and any λi⪰Ki∗0 for all i, then
Therefore,
The Lagrange dual optimization problem is
subject to
Slater’s condition and strong duality
Strong duality holds when the primal problem is convex and satisfies an appropriate constraint qualification as the scalar case such as Slater’s condition.
Example : Semidefinite program
We consider a semi-definite program in inequality form
subject to
where Fi,G∈Sk. (Here f1 is affine, and K1 is S+k, the positive semi-definite cone.)
Since S+k is self-dual, the Lagrange multiplier Z is in Sk. Then the Lagrangian is
Therefore, the dual function is
The dual problem can be expressed as
subject to
Strong duality obtains if there is a strictly feasible point because it satisfies Slater’s condition. In other words, there exists an x with