Conversation

Hi, I'm Aaron, I research AI/ML/RL, and I'm going to do a thread about artificial agents that can play Pokemon as it has been an interest of mine for some time (running on 6 years now). Pokemon is a pretty unique game, and I hope this will also interest you. 1/?
Quote Tweet
Something I’ve wondered for awhile - do you think it is possible to write a program that could play Pokemon better than humans?
Show this thread
Replying to
First off: I'm going to define Pokemon as a game between two players playing each other for the first time where the only information each player gets at the start is team preview. We'll get to singles/doubles later because they pose different challenges. 2/?
1
18
Pokemon in this manner is fundamentally different from Chess or Go or even Dota2: it is a game of _imperfect information_. I don't know what moves you have, although I might know what moves you _can_ have. Thus I have to take risks until I can know what's going on. 3/?
2
24
However, Pokemon is still a completely different ball game compared to poker. There are: Over 800 different Pokemon, each with a different moveset Over 800 different moves Over 230 different abilities A decent number of items All of which interact with the game differently. 5/?
2
22
The first challenge to a Pokemon AI is– implementing all of this! Luckily, Pokemon Showdown is open source, and there are several decent resources for making bots. 6/?
1
21
So, you might say: "Well, we can just gather enough data, and then do some machine learning to it." 1) There are so many different ways to build a Pokemon team and for a Pokemon battle to play out that you couldn't possibly gather enough data to shove it all in a NN 7/?
1
18
2) We would rather be able to learn something about the game states and the moves. The area of computer science that is most relevant (outside of algorithmic game theory) is reinforcement learning. In reinforcement learning, you "reward" the agent based on its actions. 8/?
1
23
RL is useful when you A) don't know the probability of getting to the next state (the probability that your opponent will choose a move) or B) you don't know exactly which states lead to the reward (you winning). Otherwise, we would do something called planning. 10/?
2
16
SO here's how we would want this to look: Input: State: Knowledge of pokemon/mechanics (types, movesets, stats etc), team preview, pokemon on board Possible actions: Pokemon moves (singles: 1-4, doubles: 1-4,1-4), switches (singles: 1-5, doubles: 1-2) Output: "Best actions" 11/?
1
16
I put best actions in quotes because, in theory, we don't want our game to always be predicting the opponent. You can read more about this in poker/game theory literature, but it's an exploitable strategy no matter how smart you are. We want a distribution over good actions 12/?
1
19
Single and double battles are interesting for different reasons. Single battles have a much smaller action space, with an upper bound of 4 moves + 5 switches maximum = 9 actions per turn. 13/?
1
16
Double battles have ,upper bound: 4 moves for Pokemon 1 * ~3 targets for Pokemon 1 * (this is combinatorial) 4 moves for Pokemon 2 * ~3 targets for Pokemon 2 + 2 switches ~= 146 actions per turn (ish?) This is why I play doubles as this problem is more interesting to me. 14/?
1
20
However, single battles go for much longer (~several hundred turns sometimes?) compared to double battles (~15 turns is a long game). So there's some element of planning for the future that makes singles interesting, too, in a different way. 15/?
1
18
Furthermore, lookahead is challenging. Each attack has a _different_ damage roll, and a crit chance, and sometimes even a secondary chance. If you want to calculate _every_ possible next turn, this is going to be hard in singles and computationally prohibitive in doubles. 16/?
1
16
We can maybe get around this with "depth charges"– generating and evaluating a turn several turns in the future by randomly picking actions for both players– but I'm not sure how good it will be. You'll probably need a ton of samples. This is called Monte Carlo tree search! 17/?
1
18
We will probably have to train the agent by having it play itself a zillion times. Read more about AlphaGo selfplay for background. Q: But won't it lose to humans if it doesn't train vs them? A: We might not want to explicitly predict how our opponent will act, so it's OK.18/?
2
15
Now, it's time to think about what assumptions we want to make about TEAMS in the game. This is where most of our uncertainty comes from! You could turn this into a much simpler problem: only have two teams face off, and give both players full information on the other team. 19/?
1
14
However, that's not very interesting (except maybe to IC/world finalists, who may be salivating). Do we want the agent to learn how to beat anyone given one team? Do we want the agent to find the optimal team from my teambuilder? Do we want it to _build its own team_? 20/?
1
17
Sadly– or not– team choice is the hardest part of this! This is an area of research called portfolio optimization. We want to make the best action space for ourselves. In the grander scheme of things: if we knew how to do this well, we would know which stocks to buy. 21/?
1
22
Portfolio optimization is not a solved field in the slightest! I spoke to a DeepMind engineer at a conference and this was his main concern about Pokemon. I haven't even thought about how to begin to tackle this. 22/?
1
19
That's sort of as far as I've gotten. To recap: it should be possible to create something that's very good at battling (especially singles) given a simulator (I would prefer it in C++, I don't know how much selfplay is possible with the JS server from PS! as a bottleneck) 23/?
1
19
But I don't know about AI building a team. You could probably make a great line of research from it! This topic is what got me interested in AI, even though I study natural language mostly. I'll try to answer questions and comments. Thanks! (come join me at Brown) 24/end
12
39
A year has passed since the first iteration of my Pokemon AI thread! I've grown to be a better scientist and I've spent some time researching this problem. So here goes an update:
1
24
First off, new and exciting things have happened in the game-playing AI community in a year! Specifically, OpenAI Five beat a team of humans at Dota2, and more recently AlphaStar reached the grandmaster league at Starcraft 2: openai.com/five/
1
4
(there are two links in the above post) These are huge accomplishments! How different are these systems from something that would succeed at Pokémon? Well, if you read between the lines in these papers (and PLEASE PLEASE PLEASE correct me if i'm wrong)
1
3
They don't take into account or spend very little time talking about the dynamics of _strategy selection_. In Dota 2, this is the draft phase-- in Starcraft 2, this is the build order. People that follow or play these games know that this is a HUGE part of the strategy.
1
2
And of course, as Pokémon players, we know that that strategy selection-- aka teambuilding-- can be a huge and mysterious problem, maybe even bigger than the problem of battling. That's what makes our game unique in this space (deepmind hire me for an internship)
1
6
This is challenging to talk about person-to-person, let alone formalize. What we sort of get is a two-step* optimization problem: The team is optimized in some joint way with battling skill but it's very challenging to put into equations how those should feed off of each other.
1
2
Show replies