Crypto News

The Key to Defeating Win-Stay, Lose-Shift Opponent Variants

Authors:

(1) Avrim Blum, Toyota Technological Institute at Chicago, IL, USA;

(2) Melissa Dutz, Toyota Technological Institute at Chicago, IL, USA.

Abstract and 1 Introduction

2 Setting and 2.1 Models of behaviorally-biased opponents

3 Preliminaries and Intuition

4.1 Myopic Best Responder and 4.2 Gambler’s Fallacy Opponent

4.3 Win-Stay, Lose-Shift Opponent

4.4 Follow-the-Leader Opponent and 4.5 Highest Average Payoff Opponent

5 Generalizing

5.1 Other Behaviorally-Biased Strategies

5.2 Exploiting an Unknown Strategy from a Known Set of Strategies

6 Future Work and References

A Appendix

A.1 Win-Stay Lose-Shift Variant: Tie-Stay

A.2 Follow-the-Leader Variant: Limited History

A.3 Ellipsoid Mistake Bounds

A.4 Highest Average Payoff Opponent

4.3 Win-Stay, Lose-Shift Opponent

Recall that the Win-Stay Lose-Shift opponent plays the same action immediately following a win, and switches to the next action in its action ordering immediately following a loss. The Tie-Shift variant of this opponent treats a tie like a loss and shifts, and the Tie-Stay variant treats a tie like a win and stays.

4.3.1 Variant: Tie-Shift

Proof. In the first phase, we record the correct action ordering: the opponent starts by playing the first action in their action ordering and always shifts to the next action in the ordering, so by observing n − 1 shifts we observe all n actions in the correct order. Because we play each action in succession against the opponent’s current action, they are guaranteed to shift after at most n − 1 rounds, since their action must tie with itself and lose to at least one other action.

At the start of phase 4, we correctly predict the opponent’s next action: we know we won the last round, so we know that the opponent will shift to the next action in its action order (which we have recorded correctly, as shown above). We showed above that we correctly recorded a best response to each action, so we win by playing the recorded best response to the predicted action. At the start of the next round, the same conditions hold (and will hold after each round), so we will win all subsequent rounds.

4.3.2 Variant: Tie-Stay

The main difference in beating the Tie-Stay variant compared to the Tie-Shift variant is that we can find a best response to each action by simply playing each action in succession until the opponent switches, since they only switch following a loss. For the algorithm, theorem and proof see A.1 in the Appendix.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button