## Lecture #12 (SDP duality): discussion

1. Examples

The example showing that the optimum value of an SDP might not be achieved is this: ${\min x_1}$ subject to

$\displaystyle \begin{pmatrix} x_1 & 1 \\ 1 & x_2 \\ \end{pmatrix} \succeq 0$

For this to be positive-semidefinite, we need ${x_1x_2 \geq 1}$. But then the optimum value of the SDP is ${0}$ (it is the infimum of the values ${x_1}$ can take), but it not achieved by any feasible solution.

Here is an example where the primal is feasible and finite, but the dual is infeasible.

$\displaystyle \min \begin{pmatrix} 0 & 1 \\ 1 & 0 \\ \end{pmatrix}\bullet X ~~~s.t.~~~ \begin{pmatrix} 1 & 0 \\ 0 & 0 \\ \end{pmatrix}\bullet X = 0, X \succeq 0.$

The optimum is ${0}$, achieved by ${(0, x_2)}$ for ${x_2 \geq 0}$. But the dual seeks to maximize ${0}$ subject to

$\displaystyle y \begin{pmatrix} 1 & 0 \\ 0 & 0 \\ \end{pmatrix} \preceq \begin{pmatrix} 0 & 1 \\ 1 & 0 \\ \end{pmatrix} ~~~~~~\iff~~~~~~ \begin{pmatrix} -y & 1 \\ 1 & 0 \\ \end{pmatrix} \succeq 0.$

But this is infeasible, since psd matrices cannot have non-zero entries in the same row/column as a zero diagonal entry.

2. Proofs of Some Useful Facts

We used several facts about matrices in lecture today, without proof. Here are some of the proofs. Recall that ${A \bullet B = Tr(A^\top B) = \sum_{ij} A_{ij} B_{ij}}$. For symmetric (and hence for psd) matrices, ${Tr(A^\top B) = Tr(AB)}$.

Fact 1 For any two ${n \times n}$ matrices ${A, B}$, Tr(AB) = Tr(BA).

Proof: ${Tr(AB) = \sum_{i} (AB)_{ii} = \sum_i \sum_j A_{ij} B_{ji} = \sum_j \sum_i B_{ji} A_{ij} = Tr(BA)}$.
$\Box$

Lemma 1 For a symmetric ${n\times n}$ matrix ${A}$, ${A}$ is psd if and only if ${A \bullet B \geq 0}$ for all psd ${B}$.

Proof:
One direction is easy: if ${A}$ is not psd, then there exists ${x \in {\mathbb R}^n}$ for which ${A \bullet (xx^\top) = x^\top A x < 0}$. But ${xx^\top}$ is psd, which shows that ${A \bullet B < 0}$ for some psd ${B}$.

In the other direction, let ${A, B}$ be psd. We claim that ${C}$, defined by ${C_{ij} = A_{ij}B_{ij}}$, is also psd. (This matrix ${C}$ is called the Schur-Hadamard product of ${A}$ and ${B}$.) Then

$\displaystyle \sum_{ij} A_{ij} B_{ij} = \sum_{ij} C_{ij} = \textbf{1}^\top C \textbf{1} \geq 0$

by the definition of psd-ness of ${C}$. To see the claim: since ${A}$, there exist random variables ${\{a_i\}_{i = 1}^n}$ such that ${A_{ij} = E[a_ia_j]}$. Similarly, let ${B_{ij} = E[b_ib_j]}$ for r.v.s ${\{b_i\}}$. Moreover, we can take the ${a}$‘s to be independent of the ${b}$‘s. So if we define the random variables ${c_i = a_ib_i}$, then

$\displaystyle C_{ij} = E[a_ia_j]E[b_ib_j] = E[a_ia_jb_ib_j] = E[c_i c_j],$

and we are done. (Note we used independence of the ${a}$‘s and ${b}$‘s to make the product of expectations the expectation of the product.) $\Box$

This clean random variable-based proof is from this blog post. One can also show the following claim. (I don’t know of a slick r.v. based proof—anyone see one?) BTW, the linear-algebraic proof also gives an alternate proof of the above Lemma 1.

Lemma 2 For ${A \succ 0}$ (i.e., it is positive definite), ${A \bullet B > 0}$ for all psd ${B}$, ${B \neq 0}$.

Proof: Let’s write ${A}$ as ${PDP^\top}$ where ${P}$ is orthonormal, and ${D}$ is the diagonal matrix containing ${A}$‘s eigenvalues (which are all positive, because ${A \succ 0}$.

Let ${\widehat{B} = P^\top B P}$, and hence ${B = P \widehat{B} P^\top}$. Note that ${\widehat{B}}$ is psd: indeed, ${x^\top\widehat{B} x = (Px)^\top B (Px) \geq 0}$. So all of ${\widehat{B}}$‘s diagonal entries are non-negative. Moreover, since ${B \neq 0}$, not all of ${\widehat{B}}$‘s diagonal entries can be zero (else, by ${\widehat{B}}$‘s psd-ness, it would be zero). Finally,

$\displaystyle Tr(AB) = Tr((PDP^\top)(P\widehat{B}P^\top)) = Tr(PD\widehat{B}P^\top) = Tr(D\widehat{B}P^\top P) = Tr(D\widehat{B}) = \sum_i D_{ii}\widehat{B}_{ii}.$

Since ${D_{ii} > 0}$ and ${\widehat{B}_{ii} \geq 0}$ for all ${i}$, and ${\widehat{B}_{ii} > 0}$ for some ${i}$, this sum is strictly positive. $\Box$

Lemma 3 For psd matrices ${A, B}$, ${A \bullet B = 0}$ if and only if ${AB = 0}$.

Proof: Clearly if ${AB = 0}$ then ${A \bullet B = tr(A^\top B) = tr(AB) = 0}$.

For the other direction, we use the ideas (and notation) from Lemma 1. Again take the Schur-Hadamard product ${C}$ defined by ${C_{ij} = A_{ij}B_{ij}}$. Then ${C}$ is also psd, and hence ${C_{ij} = E[c_ic_j]}$ for random variables ${\{c_i\}_{i = 1}^n}$. Then

$\displaystyle A \bullet B = \sum_{ij} C_{ij} = \sum_{ij} E[c_ic_j] = E[\sum_{ij} c_ic_j] = E[(\sum_i c_i)^2].$

If this quantity is zero, then the random variable ${\sum_i c_i = \sum_i a_ib_i}$ must be zero with probability ${1}$. Now

$\displaystyle (AB)_{ij} = \sum_k E[a_ia_k]E[b_kb_j] = \sum_k E[a_ib_j(a_kb_k)] = E[a_ib_j (\sum_k c_k)] = 0,$

so ${AB}$ is identically zero. $\Box$

3. Eigenvalues Seen Differently, or Old Friends Again

We saw that finding the maximum eigenvalue of a symmetric matrix ${A}$ can be formulated as:

$\displaystyle \begin{array}{rl} \min t \\ \text{s.t.~~~} tI - A \succeq 0 \end{array}$

and its dual

$\displaystyle \begin{array}{rl} \max A \bullet X \\ \text{s.t.~~~} X \bullet I = 1 \\ X \succeq 0 \end{array}$

The dual can be reinterpreted as follows: recall that ${X \succeq 0}$ means we can find reals ${p_i \geq 0}$ and unit vectors ${x_i \in {\mathbb R}^n}$ such that ${X = \sum_i p_i (x_ix_i^\top)}$. By the fact that ${x_i}$‘s are unit vectors, ${Tr(x_ix_i^\top) = 1}$, and the trace of this matrix is then ${\sum_i p_i}$. But by our constraints, ${X \bullet I = Tr(X) = 1}$, so ${\sum_i p_i = 1}$.

Rewriting in this language, ${\lambda_{\max}}$ is the maximum of

$\displaystyle \sum_i p_i \; (A \bullet x_ix_i^\top)$

such that the ${x_i}$‘s are unit vectors, and ${\sum_i p_i = 1}$. But for any such solution, just choose the vector ${x_{i^*}}$ among these that maximizes ${A \bullet (x_{i^*}x_{i^*}^\top)}$; that is at least as good as the average, right? Hence,

$\displaystyle \lambda_{\max} = \max_{x \in {\mathbb R}^n : \|x\|_2 = 1} A \bullet (xx^\top) = \max_{x \in {\mathbb R}^n} \frac{x^\top A x}{x^\top x}$

which is the standard variational definition of the maximum eigenvalue of ${A}$.