Lecture #12 (SDP duality): discussion

1. Examples

The example showing that the optimum value of an SDP might not be achieved is this: {\min x_1} subject to

\displaystyle  \begin{pmatrix}   x_1 & 1 \\   1 & x_2 \\   \end{pmatrix} \succeq 0

For this to be positive-semidefinite, we need {x_1x_2 \geq 1}. But then the optimum value of the SDP is {0} (it is the infimum of the values {x_1} can take), but it not achieved by any feasible solution.

Here is an example where the primal is feasible and finite, but the dual is infeasible.

\displaystyle  \min    \begin{pmatrix}   0 & 1 \\   1 & 0 \\   \end{pmatrix}\bullet X ~~~s.t.~~~   \begin{pmatrix}   1 & 0 \\   0 & 0 \\   \end{pmatrix}\bullet X = 0, X \succeq 0.

The optimum is {0}, achieved by {(0, x_2)} for {x_2 \geq 0}. But the dual seeks to maximize {0} subject to

\displaystyle  y \begin{pmatrix}   1 & 0 \\   0 & 0 \\   \end{pmatrix} \preceq \begin{pmatrix}   0 & 1 \\   1 & 0 \\   \end{pmatrix}   ~~~~~~\iff~~~~~~ \begin{pmatrix}   -y & 1 \\   1 & 0 \\   \end{pmatrix} \succeq 0.

But this is infeasible, since psd matrices cannot have non-zero entries in the same row/column as a zero diagonal entry.

2. Proofs of Some Useful Facts

We used several facts about matrices in lecture today, without proof. Here are some of the proofs. Recall that {A \bullet B = Tr(A^\top B) = \sum_{ij} A_{ij} B_{ij}}. For symmetric (and hence for psd) matrices, {Tr(A^\top B) = Tr(AB)}.

Fact 1 For any two {n \times n} matrices {A, B}, Tr(AB) = Tr(BA).

Proof: {Tr(AB) = \sum_{i} (AB)_{ii} = \sum_i \sum_j A_{ij} B_{ji} = \sum_j \sum_i B_{ji} A_{ij} = Tr(BA)}.
\Box

Lemma 1 For a symmetric {n\times n} matrix {A}, {A} is psd if and only if {A \bullet B \geq 0} for all psd {B}.

Proof:
One direction is easy: if {A} is not psd, then there exists {x \in {\mathbb R}^n} for which {A \bullet (xx^\top) = x^\top A x < 0}. But {xx^\top} is psd, which shows that {A \bullet B < 0} for some psd {B}.

In the other direction, let {A, B} be psd. We claim that {C}, defined by {C_{ij} = A_{ij}B_{ij}}, is also psd. (This matrix {C} is called the Schur-Hadamard product of {A} and {B}.) Then

\displaystyle  \sum_{ij} A_{ij} B_{ij} = \sum_{ij} C_{ij} = \textbf{1}^\top C \textbf{1} \geq 0

by the definition of psd-ness of {C}. To see the claim: since {A}, there exist random variables {\{a_i\}_{i = 1}^n} such that {A_{ij} = E[a_ia_j]}. Similarly, let {B_{ij} = E[b_ib_j]} for r.v.s {\{b_i\}}. Moreover, we can take the {a}‘s to be independent of the {b}‘s. So if we define the random variables {c_i = a_ib_i}, then

\displaystyle  C_{ij} = E[a_ia_j]E[b_ib_j] = E[a_ia_jb_ib_j] = E[c_i c_j],

and we are done. (Note we used independence of the {a}‘s and {b}‘s to make the product of expectations the expectation of the product.) \Box

This clean random variable-based proof is from this blog post. One can also show the following claim. (I don’t know of a slick r.v. based proof—anyone see one?) BTW, the linear-algebraic proof also gives an alternate proof of the above Lemma 1.

Lemma 2 For {A \succ 0} (i.e., it is positive definite), {A \bullet B > 0} for all psd {B}, {B \neq 0}.

Proof: Let’s write {A} as {PDP^\top} where {P} is orthonormal, and {D} is the diagonal matrix containing {A}‘s eigenvalues (which are all positive, because {A \succ 0}.

Let {\widehat{B} = P^\top B P}, and hence {B = P \widehat{B} P^\top}. Note that {\widehat{B}} is psd: indeed, {x^\top\widehat{B} x = (Px)^\top B (Px) \geq 0}. So all of {\widehat{B}}‘s diagonal entries are non-negative. Moreover, since {B \neq 0}, not all of {\widehat{B}}‘s diagonal entries can be zero (else, by {\widehat{B}}‘s psd-ness, it would be zero). Finally,

\displaystyle  Tr(AB) = Tr((PDP^\top)(P\widehat{B}P^\top)) = Tr(PD\widehat{B}P^\top) = Tr(D\widehat{B}P^\top P) = Tr(D\widehat{B}) = \sum_i D_{ii}\widehat{B}_{ii}.

Since {D_{ii} > 0} and {\widehat{B}_{ii} \geq 0} for all {i}, and {\widehat{B}_{ii} > 0} for some {i}, this sum is strictly positive. \Box

Lemma 3 For psd matrices {A, B}, {A \bullet B = 0} if and only if {AB = 0}.

Proof: Clearly if {AB = 0} then {A \bullet B = tr(A^\top B) = tr(AB) = 0}.

For the other direction, we use the ideas (and notation) from Lemma 1. Again take the Schur-Hadamard product {C} defined by {C_{ij} = A_{ij}B_{ij}}. Then {C} is also psd, and hence {C_{ij} = E[c_ic_j]} for random variables {\{c_i\}_{i = 1}^n}. Then

\displaystyle  A \bullet B = \sum_{ij} C_{ij} = \sum_{ij} E[c_ic_j] = E[\sum_{ij} c_ic_j] = E[(\sum_i c_i)^2].

If this quantity is zero, then the random variable {\sum_i c_i = \sum_i a_ib_i} must be zero with probability {1}. Now

\displaystyle  (AB)_{ij} = \sum_k E[a_ia_k]E[b_kb_j] = \sum_k E[a_ib_j(a_kb_k)] = E[a_ib_j (\sum_k c_k)] = 0,

so {AB} is identically zero. \Box

3. Eigenvalues Seen Differently, or Old Friends Again

We saw that finding the maximum eigenvalue of a symmetric matrix {A} can be formulated as:

\displaystyle  \begin{array}{rl}    \min t \\   \text{s.t.~~~} tI - A \succeq 0   \end{array}

and its dual

\displaystyle  \begin{array}{rl}    \max A \bullet X \\   \text{s.t.~~~} X \bullet I = 1 \\   X \succeq 0   \end{array}

The dual can be reinterpreted as follows: recall that {X \succeq 0} means we can find reals {p_i \geq 0} and unit vectors {x_i \in {\mathbb R}^n} such that {X = \sum_i p_i (x_ix_i^\top)}. By the fact that {x_i}‘s are unit vectors, {Tr(x_ix_i^\top) = 1}, and the trace of this matrix is then {\sum_i p_i}. But by our constraints, {X \bullet I = Tr(X) = 1}, so {\sum_i p_i = 1}.

Rewriting in this language, {\lambda_{\max}} is the maximum of

\displaystyle  \sum_i p_i \; (A \bullet x_ix_i^\top)

such that the {x_i}‘s are unit vectors, and {\sum_i p_i = 1}. But for any such solution, just choose the vector {x_{i^*}} among these that maximizes {A \bullet (x_{i^*}x_{i^*}^\top)}; that is at least as good as the average, right? Hence,

\displaystyle  \lambda_{\max} = \max_{x \in {\mathbb R}^n : \|x\|_2 = 1} A \bullet (xx^\top) = \max_{x \in {\mathbb R}^n} \frac{x^\top A x}{x^\top x}

which is the standard variational definition of the maximum eigenvalue of {A}.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: