Implement Self-reflection bot

The Hanabi paper does strange things for it's 3rd strategy to do with trying to recognize the strategy of the other agent using its internals and rolling back the state. This is a pain to implement so I'm leaving it as a TODO for now.