I'm going to rerun the Season 4 games, 20 times each, and publish the results here.
I originally intended to start going with season 5 (fresher in memories), but the games aren't available yet, so I'm going back one year.
The objective is twofold:
- See how random the prediction game actually is.
There's a natural tendency when your predictions come true to go "See! Told you!", and on the contrary to dismiss the result as a mere fluke when things don't go the way you expected them to (pleading guilty there, Your Honour).
Hopefully, with 20 iterations, we'll get a sense of how flukey the actual result was, and of how actually predictable each game was.
- Get a more accurate idea of each leader's performance.
Over 5 seasons, we'll have a 60+ games sample. That might seem a lot, but it's actually a very small sample, with each leader appearing 5-10 times only.
With this much larger sample, we'll be able able to better gauge each leader's performance, in the specific context of each game.
See, if we wanted to get a fair assessment, we'd need to test a game will all possible combinations of starting spots and neighbours.
Which, if my math is correct, would mean 6! = 720 possibilities for a 6-player game, and 7! = 5040 possibilities for a 7-player game.
Multiply that by 10 to get more than a single data point for each possibility, and... yeah, not gonna happen.
And that would still only be with a given field of 6 or 7 AIs, not the whole set.
So if an AI is given a dud start, or really tough neighbours, it won't perform well. Which will only be an indication about the balance of that map, and not really about that AI's general performance.
But conversely, by running the game 20 times, we'll get dumb luck out of the equation.
About the first objective, a caveat right there : I won't exactly be comparing apples and oranges, but oranges and tangerines maybe?
That's because I'll be running all those tests following season 5 rules, not season 4: so no goody huts, no Apostolic Palace.
It will make AI cross-season comparisons more reliable, but will obviously have an impact about the season 4 predictability results it yields. For instance, no AP should mean a lower average amount of war declarations (no forced peace/redeclare cycles, no holy crusades).
I'll track the results in the Excel file attached to this post.
I'll also make a dedicated post for each game, where I'll attach the game replays, the edited worldbuilder file (removing the second observer civ to get a consistant Domination threshold across seasons, removing goody huts, fixing missing starting units*, making the AI players non-playable**, removing fixed random seed***), and the resulting starting savegame (where I moved scouts/workers to the position they had in the actual game, added archery tech to the barbs, and added the Great Spies to mess with the AI sliders in order to stay consistant with the official games).
I've also attached the very simple mod that removes the AP (you'll need it if you want to run the savegames).
Game 1
Game 2
Game 3
Game 4
For each game, I'll also provide the "best prediction", based on the 20 test results.
Now, by "best prediction", I merely intend the prediction that would yield the best average score across those 20 results. Which is not necessarily the "best" prediction you could make (discarding the obvious outliers would probably be a better strategy for instance), but well, that's what I mean here.
I'll also include:
- said average score
- the score it would have yielded for the actual game that Sullla ran ("actual")
- and the running totals for both.
(*) @Sullla: this is something you might want to double-check for future games. In several cases, an AI was missing a scout or an archer.
That's something I've noticed over the years: when you try and add units to a stack in the worldbuilder, sometimes the unit doesn't get created.
(**) The AIs seem to get an extra warrior for free at the start of the games (happens in the streamed games too), but it doesn't seem consistant (some AIs do, some don't). This was my attempt at fixing that (based on the hypothesis that it was related tp the human player's free warrior), but it had no effect. Well, it at least makes for one less click when starting the scenario.
(***) I originally intended to provide each game's initial start so it could be replayed... but I tested it and the replay diverged from the original after 120ish turns.
Since the replay is not reliable, I went instead for ease of use: with no persistance of the random seed, no need to fight barbarian units at the start of each game to get a new seed. Just launch the game, and get going.
I originally intended to start going with season 5 (fresher in memories), but the games aren't available yet, so I'm going back one year.
The objective is twofold:
- See how random the prediction game actually is.
There's a natural tendency when your predictions come true to go "See! Told you!", and on the contrary to dismiss the result as a mere fluke when things don't go the way you expected them to (pleading guilty there, Your Honour).
Hopefully, with 20 iterations, we'll get a sense of how flukey the actual result was, and of how actually predictable each game was.
- Get a more accurate idea of each leader's performance.
Over 5 seasons, we'll have a 60+ games sample. That might seem a lot, but it's actually a very small sample, with each leader appearing 5-10 times only.
With this much larger sample, we'll be able able to better gauge each leader's performance, in the specific context of each game.
See, if we wanted to get a fair assessment, we'd need to test a game will all possible combinations of starting spots and neighbours.
Which, if my math is correct, would mean 6! = 720 possibilities for a 6-player game, and 7! = 5040 possibilities for a 7-player game.
Multiply that by 10 to get more than a single data point for each possibility, and... yeah, not gonna happen.
And that would still only be with a given field of 6 or 7 AIs, not the whole set.
So if an AI is given a dud start, or really tough neighbours, it won't perform well. Which will only be an indication about the balance of that map, and not really about that AI's general performance.
But conversely, by running the game 20 times, we'll get dumb luck out of the equation.
About the first objective, a caveat right there : I won't exactly be comparing apples and oranges, but oranges and tangerines maybe?
That's because I'll be running all those tests following season 5 rules, not season 4: so no goody huts, no Apostolic Palace.
It will make AI cross-season comparisons more reliable, but will obviously have an impact about the season 4 predictability results it yields. For instance, no AP should mean a lower average amount of war declarations (no forced peace/redeclare cycles, no holy crusades).
I'll track the results in the Excel file attached to this post.
I'll also make a dedicated post for each game, where I'll attach the game replays, the edited worldbuilder file (removing the second observer civ to get a consistant Domination threshold across seasons, removing goody huts, fixing missing starting units*, making the AI players non-playable**, removing fixed random seed***), and the resulting starting savegame (where I moved scouts/workers to the position they had in the actual game, added archery tech to the barbs, and added the Great Spies to mess with the AI sliders in order to stay consistant with the official games).
I've also attached the very simple mod that removes the AP (you'll need it if you want to run the savegames).
Game 1
Game 2
Game 3
Game 4
For each game, I'll also provide the "best prediction", based on the 20 test results.
Now, by "best prediction", I merely intend the prediction that would yield the best average score across those 20 results. Which is not necessarily the "best" prediction you could make (discarding the obvious outliers would probably be a better strategy for instance), but well, that's what I mean here.
I'll also include:
- said average score
- the score it would have yielded for the actual game that Sullla ran ("actual")
- and the running totals for both.
(*) @Sullla: this is something you might want to double-check for future games. In several cases, an AI was missing a scout or an archer.
That's something I've noticed over the years: when you try and add units to a stack in the worldbuilder, sometimes the unit doesn't get created.
(**) The AIs seem to get an extra warrior for free at the start of the games (happens in the streamed games too), but it doesn't seem consistant (some AIs do, some don't). This was my attempt at fixing that (based on the hypothesis that it was related tp the human player's free warrior), but it had no effect. Well, it at least makes for one less click when starting the scenario.
(***) I originally intended to provide each game's initial start so it could be replayed... but I tested it and the replay diverged from the original after 120ish turns.
Since the replay is not reliable, I went instead for ease of use: with no persistance of the random seed, no need to fight barbarian units at the start of each game to get a new seed. Just launch the game, and get going.