Arena Lineups and State Transitions

2020-05-10

In a video game I've played a lot recently there are online arenas where players line up, the two people at the front of the line fight, then the loser goes to the back of the line. Even with a relatively large skill gap between players there's some (non-zero) chance either player could win. How often does the best player get to play? How often does each line up happen?

How often do you play if you always have an X% chance of winning?

How often does a particular player play, if they have the same chance of winning against any opponent? Let's assume they have a 70% chance of winning each game in an arena with 8 players. The chance of them winning 1 game is 0.7. The chance of them winning 2 games is 0.72. The chance of them winning N games is 0.7N. The average number of games they will win in a row is 0.7 + 0.72 + 0.73 + ... + 0.7N for N going all the way to infinity. After winning some games, they'll lose 1 game and go to the back of the line, so the total number of games they play before going to the back of the line is \( 1 + 0.7 + 0.7^2 + ... = 0.7^0 + 0.7^1 + 0.7^2 + ... = \sum_{k=0}^\infty 0.7^k = \frac{1}{1 - 0.7} = 3.\overline{3} \) games. Every time that player loses, they have to sit out 8 - 2 = 6 games before they can play again. On average, our player will play 3.3 games, then wait in line for 6 games, so they are playing 3.3 / (3.3 + 6) ≈ 0.357 (35.7%) of the time. Now lets generalize, instead of assuming their win_ratio = 0.7 and the arena_capacity = 8. The number of games they play on average is \( \frac{1}{1 - \text{win_ratio}} \) before going to the back of the line. Once they're at the back of the line, they will have to site out of arena_capacity - 2 games until they are one of the two people at the front of the line who get to play again. They play this much on average: \[ \frac{\frac{1}{1 - \text{win_ratio}}}{\frac{1}{1 - \text{win_ratio}} + \text{arena_capacity} - 2} \]

win_ratio
capacity 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
2 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
3 0.500 0.526 0.556 0.588 0.625 0.667 0.714 0.769 0.833 0.909 1.000
4 0.333 0.357 0.385 0.417 0.455 0.500 0.556 0.625 0.714 0.833 1.000
5 0.250 0.270 0.294 0.323 0.357 0.400 0.455 0.526 0.625 0.769 1.000
6 0.200 0.217 0.238 0.263 0.294 0.333 0.385 0.455 0.556 0.714 1.000
7 0.167 0.182 0.200 0.222 0.250 0.286 0.333 0.400 0.500 0.667 1.000
8 0.143 0.156 0.172 0.192 0.217 0.250 0.294 0.357 0.455 0.625 1.000

What if every player has an X% chance against players below them?

What if every player has an X% chance against players below them? How often does everyone get to play? For the best player, this is the same situation as before: they always have an X% chance of winning. For everyone else, we have to care about the order of players: the entire lineup/state is important. Let's use a quick little simulation: start with an arbitrary lineup and just play the games out, keeping track of how many games each player gets to play. Since there's no player that always wins (or always loses), all possible lineups can be reached and eventually will get reached if the simulation runs long enough.

import random

win_ratio = 0.7
arena_capacity = 8

# Our lineup is a list of players.
# Each player is just an integer from 0 to arena_capacity - 1
# If a player is "less than" another player, it has win_ratio chance to win
# e.g. 0 always has win_ratio chance to win but arena_capacity - 1 never does.
lineup = list(range(arena_capacity))
random.shuffle(lineup)

games_per_player = [0 for _ in lineup]

# Simulate 1,000,000 games
for _ in range(1_000_000):
  p1 = lineup[0]
  p2 = lineup[1]
  games_per_player[p1] += 1
  games_per_player[p2] += 1
  prob_p1_wins = win_ratio if p1 < p2 else 1 - win_ratio
  if random.random() < prob_p1_wins:
    # lineup[0] won, so lineup[1] goes to the back of the line
    lineup = lineup[:1] + lineup[2:] + lineup[1:2]
  else:
    # lineup[1] won, so lineup[0] goes to the back of the line
    lineup = lineup[1:] + lineup[0:1]

total_games = sum(games_per_player) // 2
for player in sorted(lineup):
  print(f'Player {p} played {games_per_player[p] / total_games} of total games')
Player 0 played 0.357041 of total games
Player 1 played 0.304571 of total games
Player 2 played 0.26735 of total games
Player 3 played 0.242306 of total games
Player 4 played 0.224655 of total games
Player 5 played 0.211372 of total games
Player 6 played 0.200611 of total games
Player 7 played 0.192094 of total games

This simulation is imperfect but gives us a good approximation. Player 0's "0.357041" from the simulation matches true value we calculated earlier to 5 decimal places: 0.357143. It also happens to use on the order of arena_capacity memory and N time where N is the number of games simulated.

We can do better, in terms of accuracy! The simple simulation is fairly straightforward to implement and understand but it's not very accurate because it's essentially taking a random walk. In particular, this simulation isn't good at capturing rare events. We can improve this by keeping track of all of the possibilities. Imagine we started with a particular lineup with just 3 players in the lineup [0, 1, 2] (0 being the best player and 3 being the worst). If player 0 has a 70% chance of winning, we'd like to keep track of the fact that after 1 round there's a 70% chance of lineup [0, 2, 1] and a 30% chance of lineup [1, 2, 0].

Lineup
Game [0, 1, 2] [0, 2, 1] [1, 0, 2] [1, 2, 0] [2, 0, 1] [2, 1, 0]
0 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
1 0.0000 0.7000 0.0000 0.3000 0.0000 0.0000
2 0.4900 0.0000 0.2100 0.0000 0.0900 0.2100
3 0.0630 0.4900 0.1470 0.2100 0.0630 0.0270
4 0.3871 0.1470 0.1659 0.0630 0.0711 0.1659
5 0.1527 0.3871 0.1602 0.1659 0.0687 0.0654
6 0.3190 0.2190 0.1619 0.0939 0.0694 0.1367
7 0.2019 0.3367 0.1614 0.1443 0.0692 0.0865
8 0.2841 0.2543 0.1616 0.1090 0.0692 0.1218
9 0.2265 0.3120 0.1615 0.1337 0.0692 0.0971
10 0.2668 0.2716 0.1615 0.1164 0.0692 0.1144

Here's an implementation of the more complex but accurate simulation for 8 players:

import collections
import itertools

win_ratio = 0.7
arena_capacity = 8

# The probability of being in a particular lineup
lineups = set(itertools.permutations(range(arena_capacity)))
# In this simulation, every lineup is equally possible
lineup_probabilities = {lu: 1 / len(lineups) for lu in lineups}

def print_ratio_of_games_players_play(lineup_probabilities):
  prob_playing = {lu[0]: 0 for lu in lineup_probabilities.keys()}
  for lineup, prob in lineup_probabilities.items():
    prob_playing[lineup[0]] += prob
    prob_playing[lineup[1]] += prob
  for player, prob in sorted(prob_playing.items()):
    print(player, prob)


print(f'In a an arena with {arena_capacity} players,')
print(f'where better players have a {win_ratio} chance of winning...')
for iteration in range(1000):
  print(f'Each player has this chance of playing game {iteration}')
  print_ratio_of_games_players_play(lineup_probabilities)
  print()

  # Calculate the lineup lineup_probabilities after the current game.
  next_lineup_probs = {lu: 0 for lu in lineups}
  for lineup, prob in lineup_probabilities.items():
    p1 = lineup[0]
    p2 = lineup[1]

    prob_p1_wins = win_ratio if p1 < p2 else 1 - win_ratio
    p1_wins_lineup = lineup[:1] + lineup[2:] + lineup[1:2]
    next_lineup_probs[p1_wins_lineup] += prob * prob_p1_wins

    prob_p2_wins = 1 - prob_p1_wins
    p2_wins_lineup = lineup[1:] + lineup[0:1]
    next_lineup_probs[p2_wins_lineup] += prob * prob_p2_wins
  lineup_probabilities = next_lineup_probs
In a an arena with 8 players,
where better players have a 0.7 chance of winning...
Each player has this chance of playing game 0:
0 0.24999999999999914
1 0.24999999999999914
2 0.24999999999999914
3 0.24999999999999914
4 0.24999999999999914
5 0.24999999999999914
6 0.24999999999999914
7 0.24999999999999914

Each player has this chance of playing game 1:
0 0.3000000000000203
1 0.2857142857142995
2 0.2714285714285791
3 0.2571428571428588
4 0.24285714285713933
5 0.22857142857141902
6 0.21428571428570053
7 0.199999999999982

...

Each player has this chance of playing game 10:
0 0.34335874584999987
1 0.30313098115000037
2 0.2755610298499994
3 0.2524433838357137
4 0.23162117453571385
5 0.21297234892142886
6 0.1969404855071409
7 0.18397185034999944

...

Each player has this chance of playing game 100:
0 0.3571425608770196
1 0.30357771182616267
2 0.2677469842160434
3 0.24285737218990014
4 0.22564765729531372
5 0.2121941735007292
6 0.20012444622327458
7 0.1907090938715585

...

Each player has this chance of playing game 999:
0 0.35714285714285826
1 0.3035714285714283
2 0.26785714285714246
3 0.24285714285714252
4 0.22467532467532436
5 0.21103896103896144
6 0.20054945054943057
7 0.19230769230771042

After simulating 100 games worth of all the possible outcomes, the predicted ratio of games the best player plays matches theoretical ratio to 6 decimal places (0.357142) and after 999 games, it matches up to 13 decimal places (0.35714285714285). However, it requires keeping track of every possible lineup after every game and since all lineups are eventually possible, there are S = factorial(arena_capacity) = 8! = 40,320 lineups/states to keep track of as opposed to the simple simulation from earlier that only has to keep track of the 'current' lineup/state. It takes on the order of S N time to simulate S games simultaneously N times, so it arguably takes "on the order of N time to simulate N total games".

Lineup Probabilities

Adding on a small bit of code to the end to print out the lineups sorted by probability shows that the most popular lineup is [0, 7, 6, 5, 4, 3, 2, 1] and the least popular is it's opposite, [7, 0, 1, 2, 3, 4, 5, 6].

for lineup, prob in sorted(lineup_probabilities.items(), key=lambda lup: -lup[1]):
  print(lineup, prob)
(0, 7, 6, 5, 4, 3, 2, 1) 0.00045499913594621743
(0, 6, 5, 4, 3, 2, 1, 7) 0.00039031801930242854
(1, 0, 7, 6, 5, 4, 3, 2) 0.00037512030572818024
(0, 5, 4, 3, 2, 1, 7, 6) 0.0003315077259324971
(2, 1, 0, 7, 6, 5, 4, 3) 0.0003124589555428235
(0, 6, 5, 4, 3, 2, 7, 1) 0.00029280058409183026
(0, 4, 3, 2, 1, 7, 6, 5) 0.0002826173387706582
(1, 7, 0, 6, 5, 4, 3, 2) 0.0002777614405434467
(0, 1, 7, 6, 5, 4, 3, 2) 0.0002748784599086048
(3, 2, 1, 0, 7, 6, 5, 4) 0.00026449866551600515
...
(3, 7, 6, 4, 2, 5, 0, 1) 1.8456544676674505e-05
(4, 6, 5, 1, 3, 0, 7, 2) 1.8455746318601447e-05
(4, 5, 3, 1, 7, 6, 0, 2) 1.8455543611052784e-05
(3, 2, 7, 4, 1, 5, 6, 0) 1.8455440424872874e-05
(6, 7, 1, 4, 5, 3, 2, 0) 1.845525326604192e-05
(5, 7, 2, 3, 1, 0, 6, 4) 1.8454938243335214e-05
(0, 5, 1, 2, 3, 4, 7, 6) 1.84548475656733e-05
(4, 7, 5, 2, 3, 6, 1, 0) 1.845346700732128e-05
(1, 6, 7, 2, 3, 0, 5, 4) 1.845072263703833e-05
(5, 2, 1, 3, 7, 6, 4, 0) 1.8450715935717216e-05
...
(7, 6, 0, 1, 2, 3, 4, 5) 7.165412381415646e-07
(7, 0, 1, 2, 4, 3, 5, 6) 7.15385314959542e-07
(5, 6, 7, 0, 1, 2, 3, 4) 7.116445679045319e-07
(7, 0, 1, 3, 2, 4, 5, 6) 7.100718202731864e-07
(7, 0, 1, 2, 3, 5, 4, 6) 7.045636776721046e-07
(7, 0, 2, 1, 3, 4, 5, 6) 6.914352520668832e-07
(7, 0, 1, 2, 3, 4, 6, 5) 6.722921860462773e-07
(7, 1, 0, 2, 3, 4, 5, 6) 6.638508462809436e-07
(6, 7, 0, 1, 2, 3, 4, 5) 5.272934311577203e-07
(7, 0, 1, 2, 3, 4, 5, 6) 3.731504007897845e-07

Rock-Paper-Scissors

This method has the side effect that it perfectly captures how often each lineup has come up (on average) after N games (given the probabilities of the starting lineup), so if you had a set of players that played like rock-paper-scissors, the simulation would never converge (unless the starting lineup probabilities are already converged).

import collections
import itertools

# The probability of being in a particular lineup
lineups = set(itertools.permutations(['rock', 'paper', 'scissors']))
# In this simulation, we start the first game with a particular lineup
lineup_probabilities = {lu: 0 for lu in lineups}
lineup_probabilities['rock', 'paper', 'scissors'] = 1

def print_ratio_of_games_players_play(lineup_probabilities):
  prob_playing = {lu[0]: 0 for lu in lineup_probabilities.keys()}
  for lineup, prob in lineup_probabilities.items():
    prob_playing[lineup[0]] += prob
    prob_playing[lineup[1]] += prob
  for player, prob in sorted(prob_playing.items()):
    print(player, prob)


for iteration in range(1000):
  print(f'Each player has this chance of playing game {iteration}')
  print_ratio_of_games_players_play(lineup_probabilities)
  print()

  # Calculate the lineup lineup_probabilities after the current game.
  next_lineup_probs = {lu: 0 for lu in lineups}
  for lineup, prob in lineup_probabilities.items():
    p1 = lineup[0]
    p2 = lineup[1]

    prob_p1_wins = 1 if (p1, p2) in [
        ('rock', 'scissors'), ('scissors', 'paper'), ('paper', 'rock')] else 0
    p1_wins_lineup = lineup[:1] + lineup[2:] + lineup[1:2]
    next_lineup_probs[p1_wins_lineup] += prob * prob_p1_wins

    prob_p2_wins = 1 - prob_p1_wins
    p2_wins_lineup = lineup[1:] + lineup[0:1]
    next_lineup_probs[p2_wins_lineup] += prob * prob_p2_wins
  lineup_probabilities = next_lineup_probs
Each player has this chance of playing game 0
paper 1
rock 1
scissors 0

Each player has this chance of playing game 1
paper 1
rock 0
scissors 1

Each player has this chance of playing game 2
paper 0
rock 1
scissors 1

Each player has this chance of playing game 3
paper 1
rock 1
scissors 0

What if there's a matchup table for win_ratio?

Nothing changes! The only change you need to make to the code is how prob_p1_wins is calculated.

Vectors and Matrices

State Transistion Matrix

The simulation works by having a vector where each index represents a state (lineup) and each value represents the probability of being in that state. Each iteration of the algorithm (each game played) does is a linear transformation from the initial state probability vector to the next one. It's the same as multiplying our vector of states by a matrix that represents the probabilities of each state going to another state. For an 8 person arena, there are 8! lineups, so the state transition matrix must have \( S^2 = 8!^2 = 1,625,702,400 \) total vlues. Here's a matrix representing a 3-person arena where better players (players with a lower number) have a 0.7 chance of winning:

ending
state
starting state
[0, 1, 2] [0, 2, 1] [1, 0, 2] [1, 2, 0] [2, 0, 1] [2, 1, 0]
[0, 1, 2] [0, 1, 2]→[0, 1, 2] [0, 2, 1]→[0, 1, 2] [1, 0, 2]→[0, 1, 2] [1, 2, 0]→[0, 1, 2] [2, 0, 1]→[0, 1, 2] [2, 1, 0]→[0, 1, 2]
[0, 2, 1] [0, 1, 2]→[0, 2, 1] [0, 2, 1]→[0, 2, 1] [1, 0, 2]→[0, 2, 1] [1, 2, 0]→[0, 2, 1] [2, 0, 1]→[0, 2, 1] [2, 1, 0]→[0, 2, 1]
[1, 0, 2] [0, 1, 2]→[1, 0, 2] [0, 2, 1]→[1, 0, 2] [1, 0, 2]→[1, 0, 2] [1, 2, 0]→[1, 0, 2] [2, 0, 1]→[1, 0, 2] [2, 1, 0]→[1, 0, 2]
[1, 2, 0] [0, 1, 2]→[1, 2, 0] [0, 2, 1]→[1, 2, 0] [1, 0, 2]→[1, 2, 0] [1, 2, 0]→[1, 2, 0] [2, 0, 1]→[1, 2, 0] [2, 1, 0]→[1, 2, 0]
[2, 0, 1] [0, 1, 2]→[2, 0, 1] [0, 2, 1]→[2, 0, 1] [1, 0, 2]→[2, 0, 1] [1, 2, 0]→[2, 0, 1] [2, 0, 1]→[2, 0, 1] [2, 1, 0]→[2, 0, 1]
[2, 1, 0] [0, 1, 2]→[2, 1, 0] [0, 2, 1]→[2, 1, 0] [1, 0, 2]→[2, 1, 0] [1, 2, 0]→[2, 1, 0] [2, 0, 1]→[2, 1, 0] [2, 1, 0]→[2, 1, 0]
ending
state
starting state
[0, 1, 2] [0, 2, 1] [1, 0, 2] [1, 2, 0] [2, 0, 1] [2, 1, 0]
[0, 1, 2] 0 0.7 0 0 0.7 0
[0, 2, 1] 0.7 0 0.7 0 0 0
[1, 0, 2] 0 0 0 0.7 0 0.7
[1, 2, 0] 0.3 0 0.3 0 0 0
[2, 0, 1] 0 0 0 0.3 0 0.3
[2, 1, 0] 0 0.3 0 0 0.3 0

Overall, the size of this matrix is much larger than the size of a single vector but it does have one big speed advantage. If you have the current vector representing the probability of being in each possible state, \( x \), and the state transition matrix, \( A \), then the probability of being in each state after one game/transition is \( A x \). The probability of being in each state after 2 transistions it's \( A (A x) = A^2 x \) and the probability of being in each state after \( N \) transitions is \( A^N x \). In general, if there are \( S \) states, then multiplying \( A \), an SxS matrix, would take on the order of \( S^3 \) operations, so figuring out the probability of being in each state \( N \) transitions later would take on the order of \( N S^3 \) steps to calculate \( A^N \) and needs on the order of \( S^2 \) memory for the state transition matrix. We can make it take on the order of \( \log_2(N) S^3 \) steps by using exponentiation by squaring. Here's some sample code to demonstrate, using arena_capacity = 3, so it's easy to see the entire transition matrix... and to not use too much memory on my computer.

ending
state
starting state Power
[0, 1, 2] [0, 2, 1] [1, 0, 2] [1, 2, 0] [2, 0, 1] [2, 1, 0] 1
[0, 1, 2] 0 0.7 0 0 0.7 0
[0, 2, 1] 0.7 0 0.7 0 0 0
[1, 0, 2] 0 0 0 0.7 0 0.7
[1, 2, 0] 0.3 0 0.3 0 0 0
[2, 0, 1] 0 0 0 0.3 0 0.3
[2, 1, 0] 0 0.3 0 0 0.3 0
[0, 1, 2] 0.49 0 0.49 0.21 0 0.21 2
[0, 2, 1] 0 0.49 0 0.49 0.49 0.49
[1, 0, 2] 0.21 0.21 0.21 0 0.21 0
[1, 2, 0] 0 0.21 0 0.21 0.21 0.21
[2, 0, 1] 0.09 0.09 0.09 0 0.09 0
[2, 1, 0] 0.21 0 0.21 0.09 0 0.09
[0, 1, 2] 0.3871 0.147 0.3871 0.1659 0.147 0.1659 4
[0, 2, 1] 0.147 0.3871 0.147 0.3871 0.3871 0.3871
[1, 0, 2] 0.1659 0.1659 0.1659 0.147 0.1659 0.147
[1, 2, 0] 0.063 0.1659 0.063 0.1659 0.1659 0.1659
[2, 0, 1] 0.0711 0.0711 0.0711 0.063 0.0711 0.063
[2, 1, 0] 0.1659 0.063 0.1659 0.0711 0.063 0.0711
[0, 1, 2] 0.28410151 0.2264535 0.28410151 0.22660659 0.2264535 0.22660659 8
[0, 2, 1] 0.2543247 0.31197271 0.2543247 0.31197271 0.31197271 0.31197271
[1, 0, 2] 0.16157379 0.16157379 0.16157379 0.1614207 0.16157379 0.1614207
[1, 2, 0] 0.1089963 0.13370259 0.1089963 0.13370259 0.13370259 0.13370259
[2, 0, 1] 0.06924591 0.06924591 0.06924591 0.0691803 0.06924591 0.0691803
[2, 1, 0] 0.12175779 0.0970515 0.12175779 0.09711711 0.0970515 0.09711711
import itertools

import numpy
import numpy.linalg

arena_capacity = 3
win_ratio = 0.7
# Each player is represented as a number in [0, arena_capacity).
# Smaller number players have win_ratio chance of winning.
lineups = sorted(itertools.permutations(range(arena_capacity)))

def transition_probability(start_lineup, end_lineup, win_ratio):
  p1 = start_lineup[0]
  p2 = start_lineup[1]
  winner = end_lineup[0]
  loser = end_lineup[-1]
  prob_p1_wins = win_ratio if p1 < p2 else 1 - win_ratio
  if start_lineup[2:] == end_lineup[1:-1]:
    if winner == p1:
      assert loser == p2
      return prob_p1_wins
    elif winner == p2:
      assert loser == p1
      return 1 - prob_p1_wins
  return 0

# Note: This can be optimized by creating a matrix of all 0's, then filling
# in the non-zero probabilities since there are relatively fewer of them.
lineup_transition_matrix = numpy.array(
  [[transition_probability(start_lineup, end_lineup, win_ratio)
      for start_lineup in lineups]
        for end_lineup in lineups])

# In this simulation, each starting lineup is equally possible.
prob_in_lineup = numpy.array([1 / len(lineups) for _ in lineups])
transitions_so_far = 0
transitions_per_multiplication = 1
for i in range(20):
  # Sum up how often each player played based on how often each lineup happened
  prob_player_playing = [0 for _ in range(arena_capacity)]
  for lineup, prob in zip(lineups, prob_in_lineup):
    prob_player_playing[lineup[0]] += prob
    prob_player_playing[lineup[1]] += prob
  print(f'Transition matrix (for {transitions_per_multiplication} games):')
  print(lineup_transition_matrix)
  print(f'After playing {transitions_so_far} games each player has played:')
  for player, prob in enumerate(prob_player_playing):
    print(player, prob)
  print()

  # Play some more games
  prob_in_lineup = numpy.matmul(lineup_transition_matrix, prob_in_lineup)
  transitions_so_far += transitions_per_multiplication
  lineup_transition_matrix = numpy.linalg.matrix_power(lineup_transition_matrix, 2)
  transitions_per_multiplication *= 2
Transition matrix (for 1 games):
[[ 0.   0.7  0.   0.   0.7  0. ]
 [ 0.7  0.   0.7  0.   0.   0. ]
 [ 0.   0.   0.   0.7  0.   0.7]
 [ 0.3  0.   0.3  0.   0.   0. ]
 [ 0.   0.   0.   0.3  0.   0.3]
 [ 0.   0.3  0.   0.   0.3  0. ]]
After playing 0 games each player has played:
0 0.666666666667
1 0.666666666667
2 0.666666666667

Transition matrix (for 2 games):
[[ 0.49  0.    0.49  0.21  0.    0.21]
 [ 0.    0.49  0.    0.49  0.49  0.49]
 [ 0.21  0.21  0.21  0.    0.21  0.  ]
 [ 0.    0.21  0.    0.21  0.21  0.21]
 [ 0.09  0.09  0.09  0.    0.09  0.  ]
 [ 0.21  0.    0.21  0.09  0.    0.09]]
After playing 1 games each player has played:
0 0.8
1 0.666666666667
2 0.533333333333

Transition matrix (for 4 games):
[[ 0.3871  0.147   0.3871  0.1659  0.147   0.1659]
 [ 0.147   0.3871  0.147   0.3871  0.3871  0.3871]
 [ 0.1659  0.1659  0.1659  0.147   0.1659  0.147 ]
 [ 0.063   0.1659  0.063   0.1659  0.1659  0.1659]
 [ 0.0711  0.0711  0.0711  0.063   0.0711  0.063 ]
 [ 0.1659  0.063   0.1659  0.0711  0.063   0.0711]]
After playing 3 games each player has played:
0 0.772
1 0.666666666667
2 0.561333333333

...

Transition matrix (for 128 games):
[[ 0.25022624  0.25022624  0.25022624  0.25022624  0.25022624  0.25022624]
 [ 0.28823529  0.28823529  0.28823529  0.28823529  0.28823529  0.28823529]
 [ 0.16153846  0.16153846  0.16153846  0.16153846  0.16153846  0.16153846]
 [ 0.12352941  0.12352941  0.12352941  0.12352941  0.12352941  0.12352941]
 [ 0.06923077  0.06923077  0.06923077  0.06923077  0.06923077  0.06923077]
 [ 0.10723982  0.10723982  0.10723982  0.10723982  0.10723982  0.10723982]]
After playing 127 games each player has played:
0 0.769230769231
1 0.642533936652
2 0.588235294118

...

Transition matrix (for 262144 games):
[[ 0.25022624  0.25022624  0.25022624  0.25022624  0.25022624  0.25022624]
 [ 0.28823529  0.28823529  0.28823529  0.28823529  0.28823529  0.28823529]
 [ 0.16153846  0.16153846  0.16153846  0.16153846  0.16153846  0.16153846]
 [ 0.12352941  0.12352941  0.12352941  0.12352941  0.12352941  0.12352941]
 [ 0.06923077  0.06923077  0.06923077  0.06923077  0.06923077  0.06923077]
 [ 0.10723982  0.10723982  0.10723982  0.10723982  0.10723982  0.10723982]]
After playing 262143 games each player has played:
0 0.769230769228
1 0.642533936649
2 0.588235294116

Transition matrix (for 524288 games):
[[ 0.25022624  0.25022624  0.25022624  0.25022624  0.25022624  0.25022624]
 [ 0.28823529  0.28823529  0.28823529  0.28823529  0.28823529  0.28823529]
 [ 0.16153846  0.16153846  0.16153846  0.16153846  0.16153846  0.16153846]
 [ 0.12352941  0.12352941  0.12352941  0.12352941  0.12352941  0.12352941]
 [ 0.06923077  0.06923077  0.06923077  0.06923077  0.06923077  0.06923077]
 [ 0.10723982  0.10723982  0.10723982  0.10723982  0.10723982  0.10723982]]
After playing 524287 games each player has played:
0 0.769230769226
1 0.642533936647
2 0.588235294114

System of Linear Equations

If what we care about most is calculating the probability of certain lineups/states, we can do even better by thinking of the matrix as a system of linear equations. The state transition matrix can also be thought of as representing the probability of being in each state as a linear combination of the the probability of being in other states. For example, in the matrix below / from earlier, the probability of being in state [0, 1, 2] in the next game is 0.7 times the probability of being in state [0, 2, 1] the previous game plus 0.7 times the probability of being in state [2, 0, 1] the previous game.

ending
state
starting state
[0, 1, 2] [0, 2, 1] [1, 0, 2] [1, 2, 0] [2, 0, 1] [2, 1, 0]
[0, 1, 2] 0 0.7 0 0 0.7 0
[0, 2, 1] 0.7 0 0.7 0 0 0
[1, 0, 2] 0 0 0 0.7 0 0.7
[1, 2, 0] 0.3 0 0.3 0 0 0
[2, 0, 1] 0 0 0 0.3 0 0.3
[2, 1, 0] 0 0.3 0 0 0.3 0

If we assume that the probability of being in a given state doesn't actually change over time, then we can treat this as a system of linear equations to be solved. In other words, the probability of being in being in state [0, 1, 2] in general is 0.7 times the probability of being in state [0, 2, 1] in general plus 0.7 times the probability of being in state [2, 0, 1] in general. \( P([0, 1, 2]) = 0.7 P([0, 2, 1]) + 0.7 P([2, 0, 1]) \). Every row in the state transition matrix represents the left hand side and every column represents the constants/coefficients of the other probabilities on the right hand side. To get the transition matrix into a matrix that represents the system of linear equations, we need to have a setup such that \( Ax = b \), where A is all of the coefficients for the probabilities and b is the value they take on. For example, if the first row of A is [-1, 0.7, 0, 0, 0.7, 0] and the first value of b is 0, that represents the idea that \( -P([0, 1, 2]) + 0.7 P([0, 2, 1]) + 0.7 P([2, 0, 1]) = 0 \) which is the same as \( P([0, 1, 2]) = 0.7 P([0, 2, 1]) + 0.7 P([2, 0, 1]) \). If we subtract the identity, \( I_6 \), from the state transition matrix and set \( b = (0, 0, 0, 0, 0, 0) \), then we've done it! The first row of A would represent \( -P([0, 1, 2]) + 0.7 P([0, 2, 1]) + 0.7 P([2, 0, 1]) = 0 \) like we want.

ending
state
starting state
[0, 1, 2] [0, 2, 1] [1, 0, 2] [1, 2, 0] [2, 0, 1] [2, 1, 0]
[0, 1, 2] -1 0.7 0 0 0.7 0
[0, 2, 1] 0.7 -1 0.7 0 0 0
[1, 0, 2] 0 0 -1 0.7 0 0.7
[1, 2, 0] 0.3 0 0.3 -1 0 0
[2, 0, 1] 0 0 0 0.3 -1 0.3
[2, 1, 0] 0 0.3 0 0 0.3 -1

Unfortunately, these aren't enough constraints to solve the problem because \( x = (0, 0, 0, ...) \) is a solution: if the probability of being in any given state is 0, then it would satisfy all those relationships. For example, if \( P([0, 2, 1]) = 0, P([2, 0, 1]) = 0 \) then \( 0 = P([0, 1, 2]) = 0.7 P([0, 2, 1]) + 0.7 P([2, 0, 1]) \) is valid. We have to add an additional constaint that the probability of being in one of our states must be 1. In other words, \( P([0, 1, 2]) + P([0, 2, 1]) + P([1, 0, 2]) + P([1, 2, 0]) + P([2, 0, 1]) + P([2, 1, 0]) = 1 \). This is equivalent to adding a row of all 1's to A and appending 1 to b. In practice, the library I use for soving the system of linear equations requires a square matrix, making adding a row impossible, so I add 1 to every cell of A and b. Adding 1 to an arbitrary row of A and corresponding value in b should also work in practice.

import itertools

import numpy
import numpy.linalg

arena_capacity = 3
win_ratio = 0.7
# Each player is represented as a number in [0, arena_capacity).
# Smaller number players have win_ratio chance of winning.
lineups = sorted(itertools.permutations(range(arena_capacity)))

def transition_probability(start_lineup, end_lineup, win_ratio):
  p1 = start_lineup[0]
  p2 = start_lineup[1]
  winner = end_lineup[0]
  loser = end_lineup[-1]
  prob_p1_wins = win_ratio if p1 < p2 else 1 - win_ratio
  if start_lineup[2:] == end_lineup[1:-1]:
    if winner == p1:
      assert loser == p2
      return prob_p1_wins
    elif winner == p2:
      assert loser == p1
      return 1 - prob_p1_wins
  return 0

lineup_transition_matrix = numpy.array(
  [[transition_probability(start_lineup, end_lineup, win_ratio)
      for start_lineup in lineups]
        for end_lineup in lineups])

# System of linear equations
sole = lineup_transition_matrix - numpy.identity(len(lineup_transition_matrix))
# Add constraint that all lineup probabilites sum to 1
sole += 1
values = numpy.ones(len(lineup_transition_matrix))
print(f'A = \n{sole}')
print(f'b = {values}')
lineup_probs = numpy.linalg.solve(sole, values)
print('x = ')
for lineup, prob in sorted(zip(lineups, lineup_probs), key=lambda lup: -lup[1]):
  print(' ', lineup, prob)
A =
[[ 0.   1.7  1.   1.   1.7  1. ]
 [ 1.7  0.   1.7  1.   1.   1. ]
 [ 1.   1.   0.   1.7  1.   1.7]
 [ 1.3  1.   1.3  0.   1.   1. ]
 [ 1.   1.   1.   1.3  0.   1.3]
 [ 1.   1.3  1.   1.   1.3  0. ]]
b = [ 1.  1.  1.  1.  1.  1.]
x =
  (0, 2, 1) 0.288235294118
  (0, 1, 2) 0.250226244344
  (1, 0, 2) 0.161538461538
  (1, 2, 0) 0.123529411765
  (2, 1, 0) 0.107239819005
  (2, 0, 1) 0.0692307692308

The result here matches the results we saw when the simulations converged and requires on the order of \( S^3 \) steps and on the order of \( S^2 \) memory to effectively simulate infinitely many games. I'm confident, but not 100% sure, this method works for state transition matrices where it's possible to reach any state from any other state but not for other state transition matrices.

Misc. Thoughts

This was fun.

For most of these simulations, I used an arena size of 3 or 8 and a win_ratio of 0.7 but most of this generalizes quite well. For example, the tendency for lineups like (0, 7, 6, 5, 4, 3, 2, 1) to be most common still holds for other win_ratio values.

I'd be curious to know what it is in practice and to see how the results change with a more realistic matchup table and to know what a real matchup table even looks like. In practice, players have a more loose ordering where even if a good player generally wins against a group of players, they might lose against another player that generally loses more to others.

In some arenas, the arena owner will enforce a rule that if you win W times in a row, you have to go to the back of the line. I didn't get around to seeing how that affects lineup probabilities or how often particular players get to play. My guess is that if W is coprime with arena_capacity, then it will do a good job of giving players a more equal opportunity to play.