December 4, 2011

Homework 6 bug fixes

Check out the comments in this post for fixes. In short, they involve question 1 (the A should be D), and question 5b (the expression should be 3/2 OPT + 1/2 c(s,t)).

December 3, 2011

Course Evaluations

The course evaluations are now open off this page.

I realize they take a little time, but they are super-useful: given the course is a brand-new one, we could really use your feedback to see what we did right and what we didn’t, and to change things around as needed.

November 26, 2011

Homework 6 posted

For those of you who weren’t in class on Tuesday: Homework #6 is on the webpage. The number of problems you must solve is based on the number of times you scribe for the course.

November 14, 2011

Lecture 16: Multiplicative Weights

A draft of the Multiplicative Weights lecture notes has been posted.

November 9, 2011

Homework 5 Due Date

is Tuesday Nov 15th. (Not Thursday Nov 15th, as listed earlier.)

November 9, 2011

Lecture #17: Multiplicative Weights, discussion

1. Adversaries in Multiplicative Weights

Here’s some discussion about the issue of what the adverasary sees before it presents the cost vector. To begin, let’s remove the step where the algorithm actually makes a “prediction”. Instead, the algorithm only maintains a current weight vector {\overline{w}^{(t)}}. And for a cost vector {\overline{m}^{(t)}} presented by the adversary, define the algorithm’s cost to be

\displaystyle  \overline{p}^{(t)} \cdot \overline{m}^{(t)},


\displaystyle  {p}_i^{(t)} = \frac{{w}_i^{(t)}}{\sum_{j} {w}_j^{(t)}} = \frac{{w}_i^{(t)}}{\Phi^{(t)}}.

There is no longer any randomization in the algorithm; it is completely deterministic.

Now, what is the adversary allowed to base the cost vector {\overline{m}^{(t)}} on? We allow the adversary to base {\overline{m}^{(t)}} on the entire history; it can depend on {\overline{w}^{(1)}, \overline{w}^{(2)}, \ldots, \overline{w}^{(t)}}, and in particular, the current weights maintained by the algorithm. But we claim that knowing this information does not give the adversary any additional power, since each of these vectors are deterministic functions of the previous cost functions {\overline{m}^{(1)}, \overline{m}^{(2)}, \ldots, \overline{m}^{(t-1)}} presented by the adversary.

To examine the argument a bit more, consider the following adversaries:

  1. (Adversary Type 1) The adversary writes down a sequence of {T} cost vectors {\overline{m}^{(1)}, \overline{m}^{(2)}, \ldots, \overline{m}^{(T)}} up-front, and we run the (deterministic) MW algorithm on it. For each such sequence of cost vectors, and for each expert {i}, we have the guarantee that

    \displaystyle    \sum_t \overline{p}^{(t)} \cdot \overline{m}^{(t)} \leq \sum_t {m}_i^{(t)} + \text{regret}(T,\varepsilon,N).    \ \ \ \ \ (1)

    where I’ve bundled all the additive terms into this “regret” term.

  2. (Adversary Type 2) For each {t}, the adversary is “adaptive” but still deterministic: the new cost vector {\overline{w}^{(t)}} it gives is a deterministic function of the entire history so far. As we said, this entire history is just a (deterministic) function of {\overline{m}^{(1)}, \overline{m}^{(2)}, \ldots, \overline{m}^{(t-1)}}. So for any fixed adversary (which is completely determined by what this map from the history to the next cost vector is), the sequence of cost vectors it gives is some fixed sequence. So if we have a guarantee for each sequence (as above), we get that for any adversary and any expert {i}, inequality~(2) holds for the MW algorithm.

    So a Type 2 adversary is no more powerful than that of Type 1.

  3. (Adversary Type 3) For each {t}, the adversary is “adaptive” and also randomized: the new cost vector {\overline{w}^{(t)}} it gives at time {t} could be a random variable, and also depend on the entire history so far (which itself is a random variable). Note that the algorithm is still deterministic: it flips no coins. Given such an adversary, we could look at the probability space of all length-{T} sequences of cost vectors {\overline{m}^{(1)}, \overline{m}^{(2)}, \ldots, \overline{m}^{(T)}} with the probability of any such length-{T} sequence being the probability it is generated by running MW against this adversary. But inequality~(2) guarantees that for each length-{T} sequence of cost vectors, we are not much worse than the best expert {i}. Hence we get

    \displaystyle    \sum_t \mathbb{E} [\overline{p}^{(t)} \cdot \overline{m}^{(t)}] \leq \sum_t \mathbb{E}[   {m}_i^{(t)}] + \text{regret}(T,\varepsilon,N).    \ \ \ \ \ (2)

    where the expectations is taken over the randomness of the adversary.

    The contrapositive of this statement says: if we have an adversary where the expected regret is high, there must be a fixed length-{T} sequence of cost vectors where the regret is high. So, even an adversary of Type 3 is no more powerful than one of Type 1.

So you can indeed think of the adversary as choosing the cost vector {\overline{m}^{(t)}} depending on the entire history. Or as just writing down a {T}-length sequence in advance. It’s the same.

1.1. Predictions

Finally, what about the fact that the MW algorithm (as specified in lecture) was also making random predictions? The fact that the future decisions of the algorithm did not depend on these random predictions, and that the adversary does not see the prediction before it creates the cost vector, allows us to push the same argument through.

The easiest way to argue this may be to imagine that it’s not the algorithm that makes the predictions. The algorithm instead gives the vector {\overline{p}^{(t)}} to the adversary, who generates a random sample from the distribution himself. And to an external observer, the distributions of the predictions remain exactly the same, and so does their expected cost. But internally, we’ve just reduced to the case of the Type 3 randomized adversary, which we just argued about.

2. John’s Example

John asked a good (and illustrative) question: what about the adversary looks at the current {\overline{p}^{(t)}}, chooses the index {j^*} which has maximum {{p}_j^{(t)}} (say the lowest numbered index, if this is not unique), and defines {\overline{m}^{(t)}} with {m_{j^*}^{(t)} = 1} and {m_j^{(t)} = 0} for {j \neq j^*}. Hence, for this setting, we pay at least {1/N} at every step!

So the {T}-step cost would be {T/N}, which may be much more than {\varepsilon T + \frac{\ln N}{\varepsilon}} if {\varepsilon \ll 1/N}. What’s happening here?

The saving grace is that even the best expert cannot have low cost: MW will ensure that all the experts will end up paying a non-trivial amount. Indeed, how does the vector {\overline{w}^{(t)}} evolve in this case? Starts off at the vector

\displaystyle  (1, 1,1, \cdots 1).

Then it moves to

\displaystyle  (e^{-\varepsilon}, 1,1, \cdots 1).

And then to

\displaystyle  (e^{-\varepsilon}, e^{-\varepsilon},1, \cdots 1).

After {N} steps, we are at

\displaystyle  (e^{-\varepsilon}, e^{-\varepsilon},e^{-\varepsilon}, \cdots   e^{-\varepsilon}).

And so to

\displaystyle  (e^{-2\varepsilon}, e^{-\varepsilon},e^{-\varepsilon}, \cdots e^{-\varepsilon}).

But you see the pattern: any fixed expert {i} incurs a cost of {1} every {N} steps. Which means after {T} steps, each expert has incurred cost in {\{ \lfloor T/N \rfloor, \lceil T/N \rceil \}}.

And what is our cost? Even if the weight vector {\overline{w}^{(t)}} were as lopsided as

\displaystyle  (e^{-\varepsilon}, e^{-\varepsilon}, e^{-\varepsilon}, \cdots, e^{-\varepsilon}, 1).

the expected cost using the cost vector {(0,0,\cdots, 0, 1)} is approximately (for {\varepsilon \leq 1})

\displaystyle  \frac{1}{1 + (N-1)e^{-\varepsilon}} = \frac{e^\varepsilon}{e^{\varepsilon} + (N-1)} \leq \frac{ 1 + \varepsilon + \varepsilon^2}{N}

So our {T}-step total cost is at most

\displaystyle  \frac{ 1 + \varepsilon + \varepsilon^2}{N} \times T \leq \frac{T}{N} + \frac{2\varepsilon}{N} \cdot T.

Assuming that {N \geq 2} (else it’s trivial), this gives

\displaystyle  \text{MW's cost} \leq \sum_t {m}_i^{(t)} + \varepsilon T.

So all is OK.

2.1. Another Example

But what if there were some hidden {j'} that the adversary never gives a {1} in this position {j'}? Instead he chooses the position {j \neq j'} with maximum {{p}_j^{(t)}}. In this case the best expert (namely {j'}) has total cost {0}. So it better be the case that our cost is at most {\varepsilon T + (\log N)/\varepsilon}. It’s a simple exercise to see it is indeed the case.

November 9, 2011

Lecture 14: The Canonical SDP for CSPs

Lecture notes on the Canonical SDP for CSPs are published.

November 6, 2011

Lecture 13: The Canonical LP for Constraint Satisfaction Problems

Here is a draft of the lecture notes on The Canonical LP for Constraint Satisfaction Problems.

November 2, 2011

Homework 5 posted

Homework 5 has been released. It’s due Nov 15th.

November 1, 2011

Homework 3 graded

Homework 3 has been graded. The average was 16. Some comments (the detailed solutions are on the webpage, with the solution to problem 1 to be posted tomorrow).

1. Most of you got this one right. The reason why the binary search solution was not strongly poly-time was that the number of steps depends on the size of the numbers involved.

2. For this one, a few solutions tried proving it from first principles. All these statements can be proved in that way, but the easier way is to prove one version of the Farkas lemma (which you’ve already done in HW2) and then reduce these other versions to that one.

3. No surprises here, nothing to report.

4. Perhaps a majority of the solutions reduced it to standard matching and used Hall/Konig’s theorem to prove this statement. Perfectly kosher, even though I was hoping for the duality-based solution.

We give back the graded HWs 2 and 3 at the end of lecture tomorrow.