1. Adversaries in Multiplicative Weights
Here’s some discussion about the issue of what the adverasary sees before it presents the cost vector. To begin, let’s remove the step where the algorithm actually makes a “prediction”. Instead, the algorithm only maintains a current weight vector . And for a cost vector presented by the adversary, define the algorithm’s cost to be
where
There is no longer any randomization in the algorithm; it is completely deterministic.
Now, what is the adversary allowed to base the cost vector on? We allow the adversary to base on the entire history; it can depend on , and in particular, the current weights maintained by the algorithm. But we claim that knowing this information does not give the adversary any additional power, since each of these vectors are deterministic functions of the previous cost functions presented by the adversary.
To examine the argument a bit more, consider the following adversaries:
- (Adversary Type 1) The adversary writes down a sequence of cost vectors up-front, and we run the (deterministic) MW algorithm on it. For each such sequence of cost vectors, and for each expert , we have the guarantee that
where I’ve bundled all the additive terms into this “regret” term. - (Adversary Type 2) For each , the adversary is “adaptive” but still deterministic: the new cost vector it gives is a deterministic function of the entire history so far. As we said, this entire history is just a (deterministic) function of . So for any fixed adversary (which is completely determined by what this map from the history to the next cost vector is), the sequence of cost vectors it gives is some fixed sequence. So if we have a guarantee for each sequence (as above), we get that for any adversary and any expert , inequality~(2) holds for the MW algorithm.
So a Type 2 adversary is no more powerful than that of Type 1.
- (Adversary Type 3) For each , the adversary is “adaptive” and also randomized: the new cost vector it gives at time could be a random variable, and also depend on the entire history so far (which itself is a random variable). Note that the algorithm is still deterministic: it flips no coins. Given such an adversary, we could look at the probability space of all length- sequences of cost vectors with the probability of any such length- sequence being the probability it is generated by running MW against this adversary. But inequality~(2) guarantees that for each length- sequence of cost vectors, we are not much worse than the best expert . Hence we get
where the expectations is taken over the randomness of the adversary.The contrapositive of this statement says: if we have an adversary where the expected regret is high, there must be a fixed length- sequence of cost vectors where the regret is high. So, even an adversary of Type 3 is no more powerful than one of Type 1.
So you can indeed think of the adversary as choosing the cost vector depending on the entire history. Or as just writing down a -length sequence in advance. It’s the same.
1.1. Predictions
Finally, what about the fact that the MW algorithm (as specified in lecture) was also making random predictions? The fact that the future decisions of the algorithm did not depend on these random predictions, and that the adversary does not see the prediction before it creates the cost vector, allows us to push the same argument through.
The easiest way to argue this may be to imagine that it’s not the algorithm that makes the predictions. The algorithm instead gives the vector to the adversary, who generates a random sample from the distribution himself. And to an external observer, the distributions of the predictions remain exactly the same, and so does their expected cost. But internally, we’ve just reduced to the case of the Type 3 randomized adversary, which we just argued about.
2. John’s Example
John asked a good (and illustrative) question: what about the adversary looks at the current , chooses the index which has maximum (say the lowest numbered index, if this is not unique), and defines with and for . Hence, for this setting, we pay at least at every step!
So the -step cost would be , which may be much more than if . What’s happening here?
The saving grace is that even the best expert cannot have low cost: MW will ensure that all the experts will end up paying a non-trivial amount. Indeed, how does the vector evolve in this case? Starts off at the vector
Then it moves to
And then to
After steps, we are at
And so to
But you see the pattern: any fixed expert incurs a cost of every steps. Which means after steps, each expert has incurred cost in .
And what is our cost? Even if the weight vector were as lopsided as
the expected cost using the cost vector is approximately (for )
So our -step total cost is at most
Assuming that (else it’s trivial), this gives
So all is OK.
2.1. Another Example
But what if there were some hidden that the adversary never gives a in this position ? Instead he chooses the position with maximum . In this case the best expert (namely ) has total cost . So it better be the case that our cost is at most . It’s a simple exercise to see it is indeed the case.