OpenAI, a San Francisco-based nonprofit AI research organization backed by tech luminaries Reid Hoffman and Peter Thiel, has investigated autonomous systems that can achieve superhuman performance in Pong and Montezuma’s Revenge — not to mention natural language systems capable of impressive coherency. But it’s also spent the better part of four years developing AI capable of human-level play in Valve’s Dota 2 battle arena game, and it today set the fruit of its labor loose on a team of professional players.
At a packed event in San Francisco, OpenAI Five (OpenAI’s autonomous system) competed against Europe’s OG — an esports collective that became the first win four Dota Major Championships in 2017 — in a series of rounds commentated on by players William “Blitz” Lee, Austin “Capitalist” Walsh, Owen “ODPixel” Davies, Kevin “Purge” Godec, and Jorien “Sheever” van der Heijden. The stakes were somewhat higher than OpenAI’s previous matches; in a best-of-three match at Valve’s The International 2018 esports competition (where prizes totaled $25 million), two teams of pro gamers overcame OpenAI Five.
This time around, the bots won the first two matches of three in a Captain’s Draft mode, which let each team ban characters to prevent the other from selecting them. In the second match, OpenAI Five emerged victorious after about 20 minutes — roughly half the first game’s length.
The rules were the same as those last summer, at The International: the bots didn’t have invulnerable couriers (NPCs that deliver items to heroes), which in earlier rounds they used to ferry a stream of healing potions to their player characters. OpenAI also played on the latest Dota 2 patch, and with summoning and illusion features disabled. Still, it benefited both from a “more fluid” training process and significantly more training; according to OpenAI cofounder and chairman Greg Brockman, it now has a collective 45,000 years of Dota 2 gameplay experience under its belt.
Historically, an absence of long-term planning has been OpenAI Five’s Achilles’ Heel — it’s often emphasized short-term payoffs as opposed to long-term rewards. Dota 2 games generally last 30 to 45 minutes, and OpenAI says its AI agents have a “reward half-life” — the cooldown time between future payoffs — of 14 minutes. Another of the bot’s disadvantages? It doesn’t learn between games,
OpenAI preferred to defend its towers in today’s matches, although it occasionally brought over a hero to strike proactively. It made a few misplays, like directing one of its player characters — Death Prophet — to use its ultimate skill against an enemy hero, Riki, after which the latter went invisible and retreated. But it demonstrated a knack for “juggling” — that is, killing creatures away from the main action (despite the fact that it strayed away from resource gathering, attacking towers, and getting objectives). Moreover, it directed heroes to walk away in situations where damage-over-time was likely to kill them, constantly flickering in and out of invisibility to avoid being killed, and spent in-game currency to restore heroes’ health meters.
“OG played extremely weirdly the entire time, and we saw sometimes it worked, and sometimes it really, really didn’t,” RAEng research fellow Mike Cook wrote on Twitter. “I’m not sure what to make of the new bots … They’re clearly very different … But I also feel like OG’s draft and play was very different to what we’ve seen from human teams facing them before.”
At the conclusion of today’s match, OpenAI announced that it’ll release a platform for the public to play against OpenAI Five — a mode called Arena — starting April 18 and ending April 21.
How OpenAI tackled Dota 2
Valve’s Dota 2 — a follow-up to Defense of the Ancients (DotA), a community-created mod for Blizzard’s Warcraft III: Reign of Chaos — is what’s known as a multiplayer online battle arena, or MOBA. Two groups of five players, each of which are given a base to occupy and defend, attempt to destroy a structure — the Ancient — at the opposing team’s base. Player characters (heroes) have a distinct set of abilities, and collect experience points and items which unlock new attacks and defensive moves.
It’s more complex than it sounds. The average match contains 80,000 individual frames, during which each character can perform dozens of 170,000 possible actions. Heroes on the board finish an average of 10,000 moves each frame, contributing to the game’s more than 20,000 total dimensions. And each of those heroes — of which there are over 100 — can pick up or purchase hundreds of in-game items.
OpenAI Five isn’t able to handle the full game yet — it can only play 18 out of the 115 different heroes, and it can’t use abilities like summons and illusions. And in a somewhat controversial design decision, OpenAI’s engineers opted not to have it read pixels from the game to retrieve information (like human players). I uses Dota 2’s bot API instead, obviating the need for it to search the map to check where its team might be, check if a spell is ready, or estimate an enemy’s health or distance.
That said, it’s able to draft a team entirely on its own that takes into account the opposing side’s choices.
OpenAI’s been chipping away at the Dota 2 dilemma for a while now, and demoed an early iteration of its MOBA-playing bot — one which beat one of the world’s top players, Danil “Dendi” Ishutin, in a 1-on-1 match — in August 2017. It kicked things up a notch in June with OpenAI Five, an improved system capable of playing five-on-five matches that managed to beat a team of OpenAI employees, a team of audience members, a Valve employee team, an amateur team, and a semi-pro team.
In early August, it won two out of three matches against a team ranked in the 99.95th percentile. During the first of the two matches, Open AI Five started and finished strongly, preventing its human opponents from destroying any of its defensive towers. The second match was a tad less one-sided — the humans took out one of OpenAI Five’s towers — but the AI emerged victorious nonetheless. Only in the third match did the human players eke out a victory.
OpenAI Five consists of five single-layer, 1,024-unit long short-term memory (LSTM) networks — a type of recurrent neural network that can “remember” values over an arbitrary length of time — each assigned to a single hero. The networks are trained using a deep reinforcement learning model that incentivizes their self-improvement with rewards. In OpenAI Five’s case, those rewards are kills, deaths, assists, last mile hits, net worth, and other stats that track progress in Dota 2.
OpenAI’s training framework — Rapid — consists of two parts: a set of rollout workers that run a copy of Dota 2 and an LSTM network, and optimizer nodes that perform synchronous gradient descent (an essential step in machine learning) across a fleet of graphics cards. As the rollout workers gain experience, they inform the optimizer nodes, and another set of workers compare the trained LSTM networks (agents) to reference agents.
To self-improve, OpenAI Five plays 180 years’ worth of games every day — 80 percent against itself and 20 percent against past selves — on 256 Nvidia Tesla P100 graphics cards and 128,000 processor cores on Google’s Cloud Platform. Months ago, when OpenAI kicked off training, the AI-controlled Dota 2 heroes “walked aimlessly around the map.” But it wasn’t long before the AI mastered basics like lane defense in farming, and soon after nailed advanced strategies like rotating heroes around the map and stealing items from opponents.
“People used to think that this kind of thing was impossible using today’s deep learning,” Brockman told VentureBeat in an interview last year. “But it turns out that these networks [are] able to play at the professional level in terms of some of the strategies they discover … and really do some long-term planning. The shocking thing to me is that it’s using algorithms that are already here, that we already have, that people said were flawed in very specific ways.”
Fully trained OpenAI Five agents are surprisingly sophisticated. Despite being unable to communicate with each other (a “team spirit” hyperparameter value determines how much or how little each agent prioritizes individual rewards over the team’s reward), they’re masters of projectile avoidance and experience points sharing, and even of advanced tactics like “creep blocking,” in which a hero physically blocks the path of a hostile creep (a basic unit in Dota 2) to slow their progress.
Dota 2 players are already studying OpenAI Five’s styles of play, some of which are surprisingly creative. (In one match, the bots adopted a mechanic which allowed their heroes to quickly recharge a certain weapon by staying out of range of enemies.) As for OpenAI, it’s applying some of the insights gleaned from to other fields: last February, it released Hindsight Experience Replay (HER), an open source algorithm that effectively helps robots to learn from failure, and later in the year published research on a self-learning robotics system that can manipulate objects with humanlike dexterity.
Brockman said that while today’s match was the final public demonstration, OpenAI will “continue to work” on OpenAI Five.
“The beauty of this technology is that it doesn’t even know it’s [playing] Dota … It’s about letting people connect the strange, exotic but still very tangible intelligences that are created … modern AI technology.” he said. “Games have really been the benchmark [in AI research] … These complex strategy games are the milestone that we … have all been working towards because they start to capture aspects of the real world.”