Minimax Algorithm for Playing Tic Tac Toe Part I
I think we all have at some point in our life been bored, and thought of just playing a few games of Tic-Tac-Toe with a friend, generally in a classroom. Most of us recognize that this game can be easily drawn even if two players have seen and learnt the rules for the first time. But how would a machine learn to play this game? The same way any person does, which is by trying to play the game in your head till you get to a point where there is a result. Every player would have seen that Tic-Tac-Toe is one of those few games where even an average visual memory can be no problem, when trying to play out the whole game in your head till a result can be visualized.
The algorithm used by a machine to play this game is the famous Minimax algorithm, which is one of the earliest and one of the oldest artificial intelligence algorithms for playing simple two player zero-sum games.
<h2><b>What is a zero-sum game?</b></h2>
It is basically a situation where the end result will either be every player getting the same reward or a person getting the complete reward and the others ending up with nothing. A draw in a game represents both the players getting equal amount of reward (0.5-0.5). A person winning gets the full reward and the other person gets nothing (1-0). This is a very quick representation of what a zero-sum game looks like. This property of the game will be exploited by the algorithm to play this game.
This algorithm starts of with the initial state of the game, which in the case of Tic-Tac-Toe, looks like this.
<center><img src="/static/ttt.png" alt="tic-tac-toe"></img></center>
This is a two player game, where one tries to maximize its game state and the other one tries to minimize it for the other player. The person making the first move will be the maximizer and the other player is the minimizer. Due to the first player being the maximizer, all of the game states will be evaluated with respect to the maximizer. The algorithm will try to create all possibilities of the game state which can occur if only the first move takes place. This will look something like this.
<center><img src="/static/firststate.png" alt=""></img></center>
If there is any state in the possibilities generated that give a result, then that state is given a value which could be either a 0, meaning that the minimizer has won in that particular possibility, 1, meaning that the maximizer has won in that particular possibility, or 0.5, meaning the game has been drawn. All the possibilities which don't have a value assigned, meaning there is no loss, win or a draw for the players, those game states are expanded further to generate all the possibilities which can be created if the next move takes place using the game state available, to see whether a result can be seen.
When this algorithm runs, it will be creating a tree with a root node which will have the empty game state shown above. Then all the children nodes are generated which will have all the possibilities of the root node game state when the next move takes place. That is also shown above. But as you can see, there is no value which can be given to the game states as there is no result on it. Hence, we have to keep expanding each node which doesn't have a value assigned to it. If we keep going deeper into this tree which is being created, at some point all the game states which are possible would be present and all of the nodes will have some or the other value assigned to it. When this is done, only then the Minimax algorithm comes into play. Check out this diagram to see how a machine would decide which move to make in a given game state and having the game tree which has been fully constructed.
<center><img src="/static/move.png" alt="move"></img></center>
If the current node which holds the current game state, has the particular child nodes, it will choose the highest valued node or the lowest valued node, depending on whether it is a maximizer or a minimizer node. In the case above, if the current node is a minimizer, it will make a move which will get the game state to a node where the value is zero. As there is a child node which has a value of zero, it will make a move so as to get to that node. The same goes for the node being a maximizer, where the move will be made in order to reach the game state which has a value of 1. Whichever value is chosen will be the value given to the current node as well. This how a value is assigned to nodes which have game states where there is no result. As a general rule, if a node which holds game states which have no result, then the value given to it is decided by the highest or lowest value depending on whether it is a maximizer or minimizer, from its child nodes.
As the game of Tic-Tac-Toe is solved, meaning that there can hard coded rules for what to do at a certain point in a game, the Minimax algorithm, when used by the maximizer and the minimizer, always will lead to a draw. The tree which is constructed can be used by the machine to play against humans, and will always either win or draw, if the human plays perfectly as well. The <a href="/Minimax-Algorithm-for-Playing-Tic-Tac-Toe-Part-II">next part</a> will go into further detail on how the game tree constructed will be put to use for playing the game.
- Shubham Anuraj, 01:14, 26 February, 2019