November 9, 2011

## Lecture #17: Multiplicative Weights, discussion

Here’s some discussion about the issue of what the adverasary sees before it presents the cost vector. To begin, let’s remove the step where the algorithm actually makes a “prediction”. Instead, the algorithm only maintains a current weight vector ${\overline{w}^{(t)}}$. And for a cost vector ${\overline{m}^{(t)}}$ presented by the adversary, define the algorithm’s cost to be

$\displaystyle \overline{p}^{(t)} \cdot \overline{m}^{(t)},$

where

$\displaystyle {p}_i^{(t)} = \frac{{w}_i^{(t)}}{\sum_{j} {w}_j^{(t)}} = \frac{{w}_i^{(t)}}{\Phi^{(t)}}.$

There is no longer any randomization in the algorithm; it is completely deterministic.

Now, what is the adversary allowed to base the cost vector ${\overline{m}^{(t)}}$ on? We allow the adversary to base ${\overline{m}^{(t)}}$ on the entire history; it can depend on ${\overline{w}^{(1)}, \overline{w}^{(2)}, \ldots, \overline{w}^{(t)}}$, and in particular, the current weights maintained by the algorithm. But we claim that knowing this information does not give the adversary any additional power, since each of these vectors are deterministic functions of the previous cost functions ${\overline{m}^{(1)}, \overline{m}^{(2)}, \ldots, \overline{m}^{(t-1)}}$ presented by the adversary.

To examine the argument a bit more, consider the following adversaries:

1. (Adversary Type 1) The adversary writes down a sequence of ${T}$ cost vectors ${\overline{m}^{(1)}, \overline{m}^{(2)}, \ldots, \overline{m}^{(T)}}$ up-front, and we run the (deterministic) MW algorithm on it. For each such sequence of cost vectors, and for each expert ${i}$, we have the guarantee that

$\displaystyle \sum_t \overline{p}^{(t)} \cdot \overline{m}^{(t)} \leq \sum_t {m}_i^{(t)} + \text{regret}(T,\varepsilon,N). \ \ \ \ \ (1)$

where I’ve bundled all the additive terms into this “regret” term.

2. (Adversary Type 2) For each ${t}$, the adversary is “adaptive” but still deterministic: the new cost vector ${\overline{w}^{(t)}}$ it gives is a deterministic function of the entire history so far. As we said, this entire history is just a (deterministic) function of ${\overline{m}^{(1)}, \overline{m}^{(2)}, \ldots, \overline{m}^{(t-1)}}$. So for any fixed adversary (which is completely determined by what this map from the history to the next cost vector is), the sequence of cost vectors it gives is some fixed sequence. So if we have a guarantee for each sequence (as above), we get that for any adversary and any expert ${i}$, inequality~(2) holds for the MW algorithm.

So a Type 2 adversary is no more powerful than that of Type 1.

3. (Adversary Type 3) For each ${t}$, the adversary is “adaptive” and also randomized: the new cost vector ${\overline{w}^{(t)}}$ it gives at time ${t}$ could be a random variable, and also depend on the entire history so far (which itself is a random variable). Note that the algorithm is still deterministic: it flips no coins. Given such an adversary, we could look at the probability space of all length-${T}$ sequences of cost vectors ${\overline{m}^{(1)}, \overline{m}^{(2)}, \ldots, \overline{m}^{(T)}}$ with the probability of any such length-${T}$ sequence being the probability it is generated by running MW against this adversary. But inequality~(2) guarantees that for each length-${T}$ sequence of cost vectors, we are not much worse than the best expert ${i}$. Hence we get

$\displaystyle \sum_t \mathbb{E} [\overline{p}^{(t)} \cdot \overline{m}^{(t)}] \leq \sum_t \mathbb{E}[ {m}_i^{(t)}] + \text{regret}(T,\varepsilon,N). \ \ \ \ \ (2)$

where the expectations is taken over the randomness of the adversary.

The contrapositive of this statement says: if we have an adversary where the expected regret is high, there must be a fixed length-${T}$ sequence of cost vectors where the regret is high. So, even an adversary of Type 3 is no more powerful than one of Type 1.

So you can indeed think of the adversary as choosing the cost vector ${\overline{m}^{(t)}}$ depending on the entire history. Or as just writing down a ${T}$-length sequence in advance. It’s the same.

1.1. Predictions

Finally, what about the fact that the MW algorithm (as specified in lecture) was also making random predictions? The fact that the future decisions of the algorithm did not depend on these random predictions, and that the adversary does not see the prediction before it creates the cost vector, allows us to push the same argument through.

The easiest way to argue this may be to imagine that it’s not the algorithm that makes the predictions. The algorithm instead gives the vector ${\overline{p}^{(t)}}$ to the adversary, who generates a random sample from the distribution himself. And to an external observer, the distributions of the predictions remain exactly the same, and so does their expected cost. But internally, we’ve just reduced to the case of the Type 3 randomized adversary, which we just argued about.

2. John’s Example

John asked a good (and illustrative) question: what about the adversary looks at the current ${\overline{p}^{(t)}}$, chooses the index ${j^*}$ which has maximum ${{p}_j^{(t)}}$ (say the lowest numbered index, if this is not unique), and defines ${\overline{m}^{(t)}}$ with ${m_{j^*}^{(t)} = 1}$ and ${m_j^{(t)} = 0}$ for ${j \neq j^*}$. Hence, for this setting, we pay at least ${1/N}$ at every step!

So the ${T}$-step cost would be ${T/N}$, which may be much more than ${\varepsilon T + \frac{\ln N}{\varepsilon}}$ if ${\varepsilon \ll 1/N}$. What’s happening here?

The saving grace is that even the best expert cannot have low cost: MW will ensure that all the experts will end up paying a non-trivial amount. Indeed, how does the vector ${\overline{w}^{(t)}}$ evolve in this case? Starts off at the vector

$\displaystyle (1, 1,1, \cdots 1).$

Then it moves to

$\displaystyle (e^{-\varepsilon}, 1,1, \cdots 1).$

And then to

$\displaystyle (e^{-\varepsilon}, e^{-\varepsilon},1, \cdots 1).$

After ${N}$ steps, we are at

$\displaystyle (e^{-\varepsilon}, e^{-\varepsilon},e^{-\varepsilon}, \cdots e^{-\varepsilon}).$

And so to

$\displaystyle (e^{-2\varepsilon}, e^{-\varepsilon},e^{-\varepsilon}, \cdots e^{-\varepsilon}).$

But you see the pattern: any fixed expert ${i}$ incurs a cost of ${1}$ every ${N}$ steps. Which means after ${T}$ steps, each expert has incurred cost in ${\{ \lfloor T/N \rfloor, \lceil T/N \rceil \}}$.

And what is our cost? Even if the weight vector ${\overline{w}^{(t)}}$ were as lopsided as

$\displaystyle (e^{-\varepsilon}, e^{-\varepsilon}, e^{-\varepsilon}, \cdots, e^{-\varepsilon}, 1).$

the expected cost using the cost vector ${(0,0,\cdots, 0, 1)}$ is approximately (for ${\varepsilon \leq 1}$)

$\displaystyle \frac{1}{1 + (N-1)e^{-\varepsilon}} = \frac{e^\varepsilon}{e^{\varepsilon} + (N-1)} \leq \frac{ 1 + \varepsilon + \varepsilon^2}{N}$

So our ${T}$-step total cost is at most

$\displaystyle \frac{ 1 + \varepsilon + \varepsilon^2}{N} \times T \leq \frac{T}{N} + \frac{2\varepsilon}{N} \cdot T.$

Assuming that ${N \geq 2}$ (else it’s trivial), this gives

$\displaystyle \text{MW's cost} \leq \sum_t {m}_i^{(t)} + \varepsilon T.$

So all is OK.

2.1. Another Example

But what if there were some hidden ${j'}$ that the adversary never gives a ${1}$ in this position ${j'}$? Instead he chooses the position ${j \neq j'}$ with maximum ${{p}_j^{(t)}}$. In this case the best expert (namely ${j'}$) has total cost ${0}$. So it better be the case that our cost is at most ${\varepsilon T + (\log N)/\varepsilon}$. It’s a simple exercise to see it is indeed the case.

October 20, 2011

## Lecture #12 (SDP duality): discussion

1. Examples

The example showing that the optimum value of an SDP might not be achieved is this: ${\min x_1}$ subject to

$\displaystyle \begin{pmatrix} x_1 & 1 \\ 1 & x_2 \\ \end{pmatrix} \succeq 0$

For this to be positive-semidefinite, we need ${x_1x_2 \geq 1}$. But then the optimum value of the SDP is ${0}$ (it is the infimum of the values ${x_1}$ can take), but it not achieved by any feasible solution.

Here is an example where the primal is feasible and finite, but the dual is infeasible.

$\displaystyle \min \begin{pmatrix} 0 & 1 \\ 1 & 0 \\ \end{pmatrix}\bullet X ~~~s.t.~~~ \begin{pmatrix} 1 & 0 \\ 0 & 0 \\ \end{pmatrix}\bullet X = 0, X \succeq 0.$

The optimum is ${0}$, achieved by ${(0, x_2)}$ for ${x_2 \geq 0}$. But the dual seeks to maximize ${0}$ subject to

$\displaystyle y \begin{pmatrix} 1 & 0 \\ 0 & 0 \\ \end{pmatrix} \preceq \begin{pmatrix} 0 & 1 \\ 1 & 0 \\ \end{pmatrix} ~~~~~~\iff~~~~~~ \begin{pmatrix} -y & 1 \\ 1 & 0 \\ \end{pmatrix} \succeq 0.$

But this is infeasible, since psd matrices cannot have non-zero entries in the same row/column as a zero diagonal entry.

2. Proofs of Some Useful Facts

We used several facts about matrices in lecture today, without proof. Here are some of the proofs. Recall that ${A \bullet B = Tr(A^\top B) = \sum_{ij} A_{ij} B_{ij}}$. For symmetric (and hence for psd) matrices, ${Tr(A^\top B) = Tr(AB)}$.

Fact 1 For any two ${n \times n}$ matrices ${A, B}$, Tr(AB) = Tr(BA).

Proof: ${Tr(AB) = \sum_{i} (AB)_{ii} = \sum_i \sum_j A_{ij} B_{ji} = \sum_j \sum_i B_{ji} A_{ij} = Tr(BA)}$.
$\Box$

Lemma 1 For a symmetric ${n\times n}$ matrix ${A}$, ${A}$ is psd if and only if ${A \bullet B \geq 0}$ for all psd ${B}$.

Proof:
One direction is easy: if ${A}$ is not psd, then there exists ${x \in {\mathbb R}^n}$ for which ${A \bullet (xx^\top) = x^\top A x < 0}$. But ${xx^\top}$ is psd, which shows that ${A \bullet B < 0}$ for some psd ${B}$.

In the other direction, let ${A, B}$ be psd. We claim that ${C}$, defined by ${C_{ij} = A_{ij}B_{ij}}$, is also psd. (This matrix ${C}$ is called the Schur-Hadamard product of ${A}$ and ${B}$.) Then

$\displaystyle \sum_{ij} A_{ij} B_{ij} = \sum_{ij} C_{ij} = \textbf{1}^\top C \textbf{1} \geq 0$

by the definition of psd-ness of ${C}$. To see the claim: since ${A}$, there exist random variables ${\{a_i\}_{i = 1}^n}$ such that ${A_{ij} = E[a_ia_j]}$. Similarly, let ${B_{ij} = E[b_ib_j]}$ for r.v.s ${\{b_i\}}$. Moreover, we can take the ${a}$‘s to be independent of the ${b}$‘s. So if we define the random variables ${c_i = a_ib_i}$, then

$\displaystyle C_{ij} = E[a_ia_j]E[b_ib_j] = E[a_ia_jb_ib_j] = E[c_i c_j],$

and we are done. (Note we used independence of the ${a}$‘s and ${b}$‘s to make the product of expectations the expectation of the product.) $\Box$

This clean random variable-based proof is from this blog post. One can also show the following claim. (I don’t know of a slick r.v. based proof—anyone see one?) BTW, the linear-algebraic proof also gives an alternate proof of the above Lemma 1.

Lemma 2 For ${A \succ 0}$ (i.e., it is positive definite), ${A \bullet B > 0}$ for all psd ${B}$, ${B \neq 0}$.

Proof: Let’s write ${A}$ as ${PDP^\top}$ where ${P}$ is orthonormal, and ${D}$ is the diagonal matrix containing ${A}$‘s eigenvalues (which are all positive, because ${A \succ 0}$.

Let ${\widehat{B} = P^\top B P}$, and hence ${B = P \widehat{B} P^\top}$. Note that ${\widehat{B}}$ is psd: indeed, ${x^\top\widehat{B} x = (Px)^\top B (Px) \geq 0}$. So all of ${\widehat{B}}$‘s diagonal entries are non-negative. Moreover, since ${B \neq 0}$, not all of ${\widehat{B}}$‘s diagonal entries can be zero (else, by ${\widehat{B}}$‘s psd-ness, it would be zero). Finally,

$\displaystyle Tr(AB) = Tr((PDP^\top)(P\widehat{B}P^\top)) = Tr(PD\widehat{B}P^\top) = Tr(D\widehat{B}P^\top P) = Tr(D\widehat{B}) = \sum_i D_{ii}\widehat{B}_{ii}.$

Since ${D_{ii} > 0}$ and ${\widehat{B}_{ii} \geq 0}$ for all ${i}$, and ${\widehat{B}_{ii} > 0}$ for some ${i}$, this sum is strictly positive. $\Box$

Lemma 3 For psd matrices ${A, B}$, ${A \bullet B = 0}$ if and only if ${AB = 0}$.

Proof: Clearly if ${AB = 0}$ then ${A \bullet B = tr(A^\top B) = tr(AB) = 0}$.

For the other direction, we use the ideas (and notation) from Lemma 1. Again take the Schur-Hadamard product ${C}$ defined by ${C_{ij} = A_{ij}B_{ij}}$. Then ${C}$ is also psd, and hence ${C_{ij} = E[c_ic_j]}$ for random variables ${\{c_i\}_{i = 1}^n}$. Then

$\displaystyle A \bullet B = \sum_{ij} C_{ij} = \sum_{ij} E[c_ic_j] = E[\sum_{ij} c_ic_j] = E[(\sum_i c_i)^2].$

If this quantity is zero, then the random variable ${\sum_i c_i = \sum_i a_ib_i}$ must be zero with probability ${1}$. Now

$\displaystyle (AB)_{ij} = \sum_k E[a_ia_k]E[b_kb_j] = \sum_k E[a_ib_j(a_kb_k)] = E[a_ib_j (\sum_k c_k)] = 0,$

so ${AB}$ is identically zero. $\Box$

3. Eigenvalues Seen Differently, or Old Friends Again

We saw that finding the maximum eigenvalue of a symmetric matrix ${A}$ can be formulated as:

$\displaystyle \begin{array}{rl} \min t \\ \text{s.t.~~~} tI - A \succeq 0 \end{array}$

and its dual

$\displaystyle \begin{array}{rl} \max A \bullet X \\ \text{s.t.~~~} X \bullet I = 1 \\ X \succeq 0 \end{array}$

The dual can be reinterpreted as follows: recall that ${X \succeq 0}$ means we can find reals ${p_i \geq 0}$ and unit vectors ${x_i \in {\mathbb R}^n}$ such that ${X = \sum_i p_i (x_ix_i^\top)}$. By the fact that ${x_i}$‘s are unit vectors, ${Tr(x_ix_i^\top) = 1}$, and the trace of this matrix is then ${\sum_i p_i}$. But by our constraints, ${X \bullet I = Tr(X) = 1}$, so ${\sum_i p_i = 1}$.

Rewriting in this language, ${\lambda_{\max}}$ is the maximum of

$\displaystyle \sum_i p_i \; (A \bullet x_ix_i^\top)$

such that the ${x_i}$‘s are unit vectors, and ${\sum_i p_i = 1}$. But for any such solution, just choose the vector ${x_{i^*}}$ among these that maximizes ${A \bullet (x_{i^*}x_{i^*}^\top)}$; that is at least as good as the average, right? Hence,

$\displaystyle \lambda_{\max} = \max_{x \in {\mathbb R}^n : \|x\|_2 = 1} A \bullet (xx^\top) = \max_{x \in {\mathbb R}^n} \frac{x^\top A x}{x^\top x}$

which is the standard variational definition of the maximum eigenvalue of ${A}$.

October 18, 2011

## Lecture #11 discussion

In the lecture, I forgot to mention why we chose to talk about the classes of perfect graphs we did. We showed that bipartite graphs and line graphs of bipartite graphs (and their complements) are all perfect.Let’s call a graph basic if it belongs to one of these four classes: bip, co-bip, line(bip), or co-line(bip).

Recall a graph was called a Berge graph if it did not contain (as an induced subgraph) any odd hole or odd antihole. Clearly, all perfect graphs are Berge graphs, else they could contain such a offending induced subgraph, which would contradict their perfectness. The other direction is the tricky one: showing that all Berge graphs are perfect. Which is what Maria Chudnovsky, Neil Robertson, Paul Seymour, and Robin Thomas eventually did in 2002.

So how did they proceed? Here’s a mile-high view of their approach. They showed that every Berge graph $G$ either is basic (i.e., belongs to those four classes bip, co-bip, line(bip), or co-line(bip)), or else it has some structural “faults” ($G$ has a 2-join, or its complement has a 2-join, or $G$ has an M-join, or a skew partition). For us, it does not matter what exactly these structural faults are: what matters is (and this is what CRST showed) that minimal imperfect graphs cannot have any of these faults. So, if Berge graphs could be imperfect, a minimal counterexample would be basic, which is a contradiction. More on this can be found, e.g., off the links on Vasek Chvatal’s Perfect Graph Theorem Page.

Re Srivatsan’s question: given the strong perfect graph theorem, the perfect graph theorem follows as a corollary. Indeed, suppose $G$ was perfect but $\overline{G}$ was not. Then $\overline{G}$ must not be a Berge graph, and contains either an odd hole or odd antihole as an induced subgraph. But then $G$ contains an odd antihole or odd hole as an induced subgraph, so could not have been perfect.

October 5, 2011

## Lecture #7 discussion (part II)

1. LPs for MST

I forgot to return to the LP for maximum spanning tree we’d created in class, and show why it wasn’t an integral LP. To remind you, here was the LP:

$\displaystyle \begin{array}{rl} \max \sum_e & w_ex_e \\ \sum_e x_e &= |V| - 1 \\ x(\partial S) &\geq 1 \qquad \qquad \forall S \neq \emptyset, V \\ x &\geq 0 \end{array}$

Here is an example of a graph: the thin edges have weight ${0}$, and the thick ones have weight ${1}$.

The maximum weight spanning tree has weight ${2}$. However, the LP solution given here (with ${x_e = 1/2}$ on the thin edges, and ${x_e = 5/6}$ on the thick ones) has ${w^\top x = 3\cdot 5/6 = 2.5}$. It also has ${\sum_e x_e = 3\cdot 1/2 + 3\cdot 5/6 = |V| - 1}$, and you can check it satisfies the cut condition. (In fact, the main gadget that allows us to show this LP has an “integrality gap” is to assign ${1/2}$ to the edges of the thin triangle — much like for the naive non-bipartite matching LP.

One thing to note: take an undirected graph and replace each undirected edge by two directed edges of the same weight, pointing in opposite directions. Now the max-weight arborescence in this digraph has the same weight as the maximum spanning tree in the original graph. So an LP that looks pretty similar to the one above (namely ${\max \{ w^\top x \mid x(\partial v) = 1, x(\partial S) \geq 1, x \geq 0 \}}$) on that specially constructed directed graph gives an arborescence that corresponds to an integral maximum spanning tree on the original undirected graph.

1.1. Counterexamples for Greedy

Also, here are two counterexamples: the left one for Ankit’s suggestion of running a Prim-like greedy algorithm (starting from the root) for min-cost arborescence. The right one we already saw in lecture: for a naive Kruskal-like greedy algorithm.

October 4, 2011

## Lecture #7 discussion

1. Perfect Matchings

The proof of integrality of the non-bipartite perfect matching polytope was a bit fast. Let me give some more details.

Recall that the LP is defined by the variables ${x_e}$ for ${e \in E}$, and the constraints:

$\displaystyle \begin{array}{rl} x(\partial v) = 1 \qquad \qquad & \forall v \in V \\ x(\partial S) \geq 1 \qquad \qquad & \forall S \subset V, |S| \text{ odd} \\ x \geq 0 \qquad \qquad & \end{array}$

Note that in the second set of constraints, we can just consider ${S}$ of size at least ${3}$ and at most ${|V|-3}$: if ${|S| = 1}$ or ${|V|-1}$, then the first set of constraints already implies ${x(\partial S) = 1}$. So just focus on

$\displaystyle \begin{array}{rl} x(\partial v) = 1 \qquad \qquad & \forall v \in V \\ x(\partial S) \geq 1 \qquad \qquad & \forall S \subset V, |S| \text{ odd }, 3 \leq |S| \leq |V|-3\\ x \geq 0 \qquad \qquad & \end{array}$

Let us call this perfect matching polytope ${PM}$. We’ll call the first set of equalities the vertex constraints, and the second set the odd-set inequalities.

Clearly, every perfect matching is a feasible solution to this LP. What the next theorem says is perfect matchings are precisely the vertices of this LP. (We already saw this for bipartite graphs, in Lecture~3.)

Theorem 1 (Perfect Matching Theorem)

Every vertex of this LP is integral.

Suppose not, and suppose there exists graphs for which there is a fractional vertex. Consider a minimal counterexample ${G = (V,E)}$ (minimizing the sum of ${|V| + |E|}$, say), and some vertex solution ${x}$ that is not integral. Clearly, ${|V|}$ must be even, else it will not satisfy the odd set constraint for ${S = V}$. First, the claim is that ${G}$ cannot have a vertex of degree~${1}$, or be disconnected (else we’ll get a smaller counterexample) or be just an even cycle (where we know this LP is indeed integral). Being connected implies that ${|E| \geq |V|-1}$, and neither being a cycle nor having a degree-${1}$ vertex implies that ${|E| \neq |V|}$. So ${|E| > |V|}$.

Recall there are ${|E|}$ variables. So any vertex/BFS is defined by ${|E|}$ tight constraints. If any of these tight constraints are the non-negativity constraints ${x_e \geq 0}$, then we could drop that edge ${e}$ and get a smaller counterexample. And since at most ${|V|}$ tight constraints come from the vertex constraints. So at least one odd-set constraint should be tight. Say this tight odd-set constraint is for the odd set ${W \subseteq V}$ with ${|W| \geq 3}$: i.e.,

$\displaystyle x(\partial W) = 1$

Now consider the two graphs ${G/W}$ and ${G/\overline{W}}$ obtained by contracting ${W}$ and ${\overline{W}}$ to a single new vertex respectively, and removing the edges lying within the contracted set. Since both ${W}$ and ${\overline{W}}$ have at least ${3}$ vertices, both are smaller graphs.

Now ${x}$ naturally extends to feasible solutions ${y}$ and ${z}$ for these new graphs. E.g., to get ${y}$, set ${y_e = x_e}$ for all edges ${e \in E \setminus \binom{W}{2}}$. Note that if the set ${W}$ got contracted to new vertex ${\widehat{w}}$ in ${G/W}$, then the fact that ${x(\partial W) = 1}$ implies that ${y(\partial \widehat{w}) = 1}$, and hence ${y}$ is a feasible solution to the perfect matching polytope for graph ${G/W}$. Similarly, ${z}$ is a feasible solution to the perfect matching polytope for graph ${G/\overline{W}}$.

By minimality of ${G}$, it follows that the perfect matching LP is integral for both ${G/W}$ and ${G/\overline{W}}$: i.e., the vertices of the perfect matching polytope for these smaller graphs all correspond to perfect matchings. And that means that

$\displaystyle y = \sum_i \lambda_i \cdot \chi_{M_i},$

where ${\chi_{M_i}}$ is the natural vector representation of the perfect matching ${M_i}$ in ${G/W}$, for values ${\lambda_i \geq 0, \sum_i \lambda_i = 1}$. Also, ${\lambda_i}$‘s can be taken to be rational, since ${y}$ is rational, as are ${\chi_{M_i}}$. Similarly, we have a rational convex combination

$\displaystyle z = \sum_i \mu_i \cdot \chi_{N_i},$

where ${N_i}$ are perfect matchings in ${G/\overline{W}}$. Since ${\lambda_i, \mu_i}$ are rationals, we could have repeated the matchings and instead written

$\displaystyle \begin{array}{rl} y &= \frac1k \sum_i \chi_{M_i} \\ z &= \frac1k \sum_i \chi_{N_i} \end{array}$

Finally, we claim that we can combine these to get

$\displaystyle x = \frac1k \sum_i \chi_{O_i}$

where ${O_i}$‘s are perfect matchings in ${G}$. How? Well, focus on edge ${e = \{l,r\} \in \partial W}$, with ${l \in W}$. Note that ${y_e = z_e = x_e}$. If we look at ${k}$ matchings ${M_i}$ in the sum for ${y}$: exactly ${x_e}$ fraction of these matchings ${M_i}$ — that is, ${kx_e}$ matchings — contain ${e}$. Similarly, exactly ${kx_e}$ of the matchings ${N_i}$ in the sum for ${x}$ contain ${e}$. Now we can pair such matchings (which share a common edge in ${\partial W}$) up in the obvious way: apart from the edge ${e}$, such an ${M_i}$ contains edges only within ${\overline{W}}$ and matches up all the vertices in ${\overline{W}}$ except vertex ${r}$, and ${N_i}$ contains edges only within ${W}$ and matches up all the vertices in ${W \setminus \{l\}}$. And ${e}$ matches up ${\{l,r\}}$. Hence putting together these perfect matchings ${M_i}$ and ${N_i}$ in ${G/W}$ and ${G/\overline{W}}$ gets us a perfect matching ${O_i}$ for ${G}$.

So ${x}$ can be written as a convex combination of perfect matchings of ${G}$. Hence, for ${x}$ to be an extreme point (vertex) itself, it must be itself a perfect matching, and integral. This gives us the contradiction.

1.1. Max-Weight Matchings

We didn’t get to this, but suppose you want to write an LP whose vertices are precisely (integral) matchings in ${G}$, not just the perfect matchings. Here is the polytope Edmonds defined.

$\displaystyle \begin{array}{rl} x(\partial v) \leq 1 \qquad \qquad & \forall v \in V \\ \textstyle \sum_{e \in \binom{S}{2}} x_e \leq \frac{|S| - 1}{2} \qquad \qquad & \forall S \subset V, |S| \text{ odd } \\ x \geq 0 \qquad \qquad & \end{array}$

Clearly, all matchings in ${G}$ are feasible for this LP. Moreover, one can use the Perfect Matching Theorem above to show that every vertex of this polytope is also integral.

October 2, 2011

## Lecture #6 discussion

A couple observations about Thursday’s lecture:

Most of you have probably seen Hall’s theorem before, which says that in a bipartite graph ${G = (U,V,E)}$, there is a matching ${M}$ that matches all the vertices on the left (i.e. has cardinality ${|U|}$) if and only if every set ${S \subseteq U}$ on the left has at least ${|S|}$ neighbors on the right. I wanted to point out that König’s theorem (which shows that the size of the maximum matching in ${G}$ is precisely ${vc(G)}$) is equivalent to Hall’s theorem. The proof is a fun exercise, so I’ll leave it out.

2. Polytope Integrality.

Something I mentioned in passing, but is worth saying slowly and more explicitly is this: several of the polyhedra defined by the LPs we considered in lecture were integral. (I.e., all the extreme points of these polyhedra belong to ${\mathbb{Z}^n}$, not just to ${\mathbb{R}^n}$.) Let’s go over these:

• When arguing about the min-(s,t)-cut LP, we said: consider any feasible solution ${x}$ to the min-cut LP, and any cost vector ${c}$. Then there exists an integer s-t cut ${(S_\alpha, \overline{S_\alpha})}$ with cost at most ${c^\top x}$. (The proof was via the randomized rounding.) Note that this s-t cut corresponds to an integer vector ${y \in R^{|E|}}$ where ${y_e = 1 \iff e \in S_\alpha \times \overline{S}_\alpha}$, which is also feasible for the cut LP.

This proof also gives us that the corresponding polyhedron ${K}$ is integral. To see this, consider any vertex ${x}$ of ${K}$. By the definition of vertex, there is some cost function such that ${x}$ is the unique minimizer for ${\min \{ c^\top x \mid x \in K\}}$. But since ${c^\top y \leq c^\top x}$, and ${y \in K}$, it follows that ${x = y}$ and hence ${x}$ is integral.

• For the bipartite matching LP, again consider any vertex ${x}$ of the polytope. We showed that if ${x}$ was fractional, it could be written as the average of two other solutions, and hence was not an extreme point. But since vertices and extreme points are the same, this contradicts the fact that ${x}$ was a vertex. Hence, this also shows that if we assign weights to the edges, an optimal BFS to this LP gives us the maximum weight matchings.
• An identical argument held for the bipartite vertex cover polyhedron, and hence for any assignment of weights to the vertices, an optimal BFS of this LP gives us a min-weight vertex cover.

We’ll revisit this issue of integral polytopes again in a few lectures.

September 28, 2011

## Lecture #5 clarification

Kevin asked in class: in the “geometric” proof of duality we talked about, did we need the fact that existed a BFS, i.e., that there were ${n}$ tight constraints? (Recall we were considering the LP ${\max\{ c^\top x \mid Ax \leq b\}}$; since we were not talking about equational form LPs, the optimum could be achieved at a non-BFS.) He was right: we don’t. Here’s a proof.

Pick an optimal feasible point for the LP, call this point ${x^*}$. Let ${a_i^\top x \leq b_i}$ for ${i \in I}$ be all the constraints tight at ${x^*}$—there may be ${n}$ linearly independent constraints in there, or there may be fewer, we don’t care. Now the claim is that the objective function vector ${c}$ is contained in the cone ${K = \{ x \mid x = \sum_{i \in I} \lambda_i a_i, \lambda_i \geq 0 \}}$ generated by these vectors ${\{a_i\}_{i \in I}}$. If we prove this claim, we’re done–the rest of the argument follows that in lecture.

(Indeed, this was precisely where we were being hand-wavy earlier, even when we assumed ${x^*}$ was a BFS. So this also answers Jamie’s request.)

Ok, back to business at hand. Suppose ${c}$ does not lie in the cone ${K}$ generated by the vectors ${\{a_i\}_{i \in I}}$. Then there must be a separating hyperplane between ${c}$ and ${K}$: i.e., there exists a vector ${d \in {\mathbb R}^n}$ such that ${a_i^\top d \leq 0}$ for all ${i \in I}$, but ${c^\top d > 0}$. So now consider the point ${z = x^* + \epsilon d}$ for some tiny ${\epsilon > 0}$. Note the following:

• We claim that for small enough ${\epsilon > 0}$, the point ${z}$ satisfies the constraints ${Ax \leq b}$. Consider ${a_j^\top x \leq b}$ for ${j \not \in I}$: since this constraint was not tight, we won’t violate it if we choose ${\epsilon}$ small enough. And for ${a_j^\top x \leq b}$ with ${j \in I}$? See, ${a_j^\top z = a_j^\top x + \epsilon\, a_j^\top d = b + \epsilon\, a_j^\top d \leq b}$, since ${\epsilon > 0}$ and ${a_j^\top d \leq 0}$.
• What about the objective function value? ${c^\top z = c^\top x^* + \epsilon\, c^\top d > c^\top x^*}$.

But this contradicts the fact that ${x^*}$ was optimal.

September 27, 2011

## Lecture #1 clarification

(Sorry this is so late… — Anupam)

For Lecture #1, Ankit had asked a question about how Theorem 1.3 helped solve an equational form LP in time $\binom{n}{m}$. We didn’t quite answer is completely in class, here’s the complete explanation.

His question was this: Theorem 1.3 says that if the LP is feasible and bounded, then the optimum is achieved at a BFS (and we could try all of them). But what if the LP was unbounded or infeasible? How could we detect those cases in the same amount of time? Here’s one way.

To start, Fact 2.1 says  any equational form LP that is feasible has a BFS. So if all the basic solutions are infeasible (i.e., there is no BFS), we can safely answer “Infeasible”.

So now it suffices to differentiate between the bounded/unbounded subcases. So consider the BFS that achieves the lowest objective function value among all the BFSs (assuming the LP is a minimization LP). We know the optimal value is either this value (call it $\delta$), or it is $-\infty$. Consider the LP obtained by adding in the new constraint $c^\top x = \delta - 1$. This is another equational form LP with $m+1$ constraints, and we can use the previous argument to decide feasibility. If this new LP is feasible, the original LP had optimum value $-\infty$, else the optimum of the original LP was $\delta$.

Anyone see a different/even simpler solution?