In this article, I will explain how I did my Pommerman agent.
What is Pommerman?
Pommerman is an AI environment based on the famous nostalgic Bomberman game. Its name comes from the fact that each agent gets only a part of the observable state, and that each decision of an agent is modeled as a Partial Observable Markov Decision Process (MDP).
Each agent can choose among 6 actions at each time step: up, left, right, down, bomb or nothing. The action space is relatively small compared to other game like Go. However, Pommerman is not a turn by turn game, but the actions of each agent are executed simultaneously.
There are multiple variations of the game, including a Free-For-All (FFA) and a Team competition. In the latter, in which I participated, we control two agents separately with zero communication against two others agents.
My CoacTest agent
How can we tackle this contest, which algorithm to apply? Using reinforcement learning would be too difficult to get a decent agent. Moreover, I could not allocate too much time for this contest. So I went to the classic tree search based direction.
Using an MCTS can be hard as the decision time between each timestep is only 100ms. So I went for a flat Montecarlo with fixed depth. The algorithm will simulate the agent actions and state randomly until the max depth is reached, and repeat until the time quota is exceeded. For the actions of others agents, we can run a simple rule-based AI. As we consider that the other agents will be strong, their actions can be computed knowing our own agent action, like a minimax approach. However, this slows the simulation phase for our agent, and given this minimal time budget, it was better to focus only on our agent.
Also, I used a more optimized version of the environment than in the official repository. There is one Cython implementation thanks to @Tambetm. I did not have time to check the C++ one, but the performance can be slightly similar. The speed is very important as it will determine how many simulations you could run and also the search depth you can reach.
I modified a bit the environment to give more information to the agent, like remembering the enemy agents position, bombs locations and the direction of the moving bombs. I needed to adapt the environment as it was only compatible with FFA version, and fully observable state. Moreover, I reduced the timestep needed for the bomb to explode because with the default value is too high for the depth of the tree search to allow the agent to put bomb aggressively.
The evaluation of the board has been tuned using Bayesian optimization over multiple hyperparameters against the rule-based provided simple-agent. I parallelized this process and deployed the job to Google cloud platform using my remaining free credits. Was very successful, manage to get ~80% win rate against the rule-based agent. The other 20% is not a losing game, but a tie.
The bot behaves passively, it does not chase enemy agents to kill them. However, we can notice that once it has the kick ability, it kicks the bombs toward the enemy. It is a strong powerup that needs to be picked absolutely, it is useful for the attack, but also allows surviving blocking situations. The hyperparameters search ends up with a very high coefficient for this bonus.
I wrote the agent back in September, there were multiple changes in the official Pommerman repository since then. Including an environment for break the tie, the collapsing wall environment. However, I did not have to update my agent or write another AI to perform in this environment.
The final bracket is here, my CoacTeam ranked 5th. You can check out an interesting post mortem from the other team ranked 5th ex-aequo with learning agent here. The report of the best learning agent can be found [here](Continual Match Based Training in Pommerman: Technical Report).
This was my first participation in a conference contest, it was very fun and I plan to do it again!
Thanks to @Cinjon checking my agent code during the submission.