A scientist from Peking University recently published a pre-print research paper detailing a video game-based system designed to teach AI agents to evade pursuit.
The name of the game is StarCraft II, or rather, a mini-game designed in the SCII training environment. And the point is to flip a common paradigm on its head in order to discover new methods of training AI.
Up front: Most research in the pursuit-evasion genre of AI and game theory involves teaching machines to explore spaces. Since most AI training involves a system that rewards the machine for accomplishing a goal, developers often use gamification as an impetus for training.
In other words: you can’t just shove a robot in a room and say “do stuff.” You have to give it goals and a reason to accomplish them. So researchers design AI to inherently seek rewards.
The traditional exploration training environment tasks an AI agent with manipulating digital models to explore a space until it completes its goals or finds its rewards.
It works sort of like Pac Man: the AI has to move around an environment until it gobbles up all the reward pellets.
Background: Ever since DeepMind’s AI systems mastered Chess and Go, SCII has been the go-to training environment for adversarial AI. It’s a game that naturally pits players, AI, or combinations of player and AI against one another.
But, more importantly, DeepMind and other research organizations have already done the hard work of turning the game’s source code into an AI playground complete with several mini-games that allow devs to focus their work.
Researcher Xun Huang, the aforementioned scientist at Peking University, set out to explore a “pursuit-evasion” paradigm for training AI models. But found the SCII model to have some inhibiting limitations.
In the baked-in version of the pursuit and evasion game, you can only assign control of the pursuers to an AI.
The basic set up involves three pursuer characters (represented by soldier-type units from the game) and 25 evader characters (represented by aliens from the game). There’s also a mode using “fog of war” to obscure the map, thus making it more difficult for the pursuer to locate and eliminate the evader, but the research indicates that’s a 1V1 mode.
Hilariously, the base behavior for the 25 invading units is to remain immobilized wherever they spawn and then to attack the pursuers on sight. As the pursuers are far more powerful than the evaders, this results in the expected slaughter of every evader immediately upon being found.
Huang’s paper details a paradigm for training AI in the SCII environment that focuses on training AI to evade the pursuers. In their version, the AI attempts to escape into the fog of war in order to avoid being caught and killed.
Quick take: This is fascinating research using video games that could have massive real-world implications. The world’s most advanced military organizations use video games to train humans.
And AI devs use these training environments to train AI brains for life inside of a real-world robot. A developer might introduce a model to a game that just runs into the first wall it sees for the first few hundred iterations, but after a few thousand or a few million the models tend to catch on. It saves a lot of bricks to train them in a game first.
If we apply that here, Huang’s work seems exciting – but it’s hard not to imagine some scary ways AI could use it’s highly-trained ability to flee from pursuers.
What have these future robots done? Why are people chasing them?
On the other hand, the information we glean from AI‘s new insights into the arts of pursuit of evasion could help humans get better at both as well.
Read the whole paper here on arXiv.