Connect4 Through MCTS
Connect-4 is way less complicated and intricate than chess.
It has a maximum branching factor of 7, which is about 5 times less than chess, on average. Revising <b>Monte Carlo Tree Search</b>, I thought to apply it to a Connect-4 environment. Before that I was trying a <b>deep Q-learning</b> approach to solving or at least creating a heuristic which is closest to a brute-force mechanism , which exists with any game which has a finite state space, but computationally infeasible.
<h2>Deep Q-Learning Approach</h2>
This <a href="https://github.com/anuraj-18/connect-4">approach</a> was a bit problematic, when I tried to implement it. From not being sure on how to initialize the unknown game states, to what quasi-labels had to be assigned to searched game states, it was all a bit of a mess.
Still soldiered on though and made an agent which was at least learning, but taking way too much time. Convergence to some sort of a "smart" move, was essentially just brute forcing and not creating an "intelligent" heuristic.
I will revisit this once again, when I know more about sparse reward distribution, but just wanted to let you know the issues I faced while trying this approach.
This <a href="https://github.com/anuraj-18/connect4-v2">method</a> was deceptively good. I initially rejected this method because I felt the rollout method was too random to make any sort of a smart move in the final analysis. But sometimes theory does display a good amount of practicality in real world situations, whereas I felt that many theoretical models take many assumptions to make an immaculate model, but suffer the consequences when faced with the vagaries of the real world. Essentially we tweak the model on an ex-post facto basis.
This method allows for a decent mix of good play and quick training time, which can be called iterations in the MCTS environment. I would definitely recommend this sort of a method before actually trying to delve into a deep learning technique. The rollout method does a fine job in practical situations and increasing iterations can help in rectifying any sort of false positive results which can arise from the rollout process.
I have also seen some recommendations on how to fix the randomized play to get a "value" from a terminal state, by adding some mini-heuristic to make the play at least somewhat reasonable. The mini-heuristic could be like a one move look-ahead to see if there is a win to gain, or a loss to avoid.
Just wanted to give my quick analysis on both these methods for training an agent, and for now, it looks like MCTS does do a great job, on its own, to play in a decent manner.
- Shubham Anuraj, 04:46 PM, 02 Apr, 2021