Are you, in fact, a pregnant lady who lives in the apartment next door to Superdeath's parents? - Commodore

Create an account  

 
AI Survivor - Season Four and Five Reruns

I'm going to rerun the Season 4 games, 20 times each, and publish the results here.
I originally intended to start going with season 5 (fresher in memories), but the games aren't available yet, so I'm going back one year.

The objective is twofold:
- See how random the prediction game actually is.
There's a natural tendency when your predictions come true to go "See! Told you!", and on the contrary to dismiss the result as a mere fluke when things don't go the way you expected them to (pleading guilty there, Your Honour).
Hopefully, with 20 iterations, we'll get a sense of how flukey the actual result was, and of how actually predictable each game was.
- Get a more accurate idea of each leader's performance.
Over 5 seasons, we'll have a 60+ games sample. That might seem a lot, but it's actually a very small sample, with each leader appearing 5-10 times only.
With this much larger sample, we'll be able able to better gauge each leader's performance, in the specific context of each game.

See, if we wanted to get a fair assessment, we'd need to test a game will all possible combinations of starting spots and neighbours.
Which, if my math is correct, would mean 6! = 720 possibilities for a 6-player game, and 7! = 5040 possibilities for a 7-player game.
Multiply that by 10 to get more than a single data point for each possibility, and... yeah, not gonna happen.  rolleye
And that would still only be with a given field of 6 or 7 AIs, not the whole set.

So if an AI is given a dud start, or really tough neighbours, it won't perform well. Which will only be an indication about the balance of that map, and not really about that AI's general performance.
But conversely, by running the game 20 times, we'll get dumb luck out of the equation.


About the first objective, a caveat right there : I won't exactly be comparing apples and oranges, but oranges and tangerines maybe?
That's because I'll be running all those tests following season 5 rules, not season 4: so no goody huts, no Apostolic Palace.
It will make AI cross-season comparisons more reliable, but will obviously have an impact about the season 4 predictability results it yields. For instance, no AP should mean a lower average amount of war declarations (no forced peace/redeclare cycles, no holy crusades).

I'll track the results in the Excel file attached to this post.
I'll also make a dedicated post for each game, where I'll attach the game replays, the edited worldbuilder file (removing the second observer civ to get a consistant Domination threshold across seasons, removing goody huts, fixing missing starting units*, making the AI players non-playable**, removing fixed random seed***), and the resulting starting savegame (where I moved scouts/workers to the position they had in the actual game, added archery tech to the barbs, and added the Great Spies to mess with the AI sliders in order to stay consistant with the official games).
I've also attached the very simple mod that removes the AP (you'll need it if you want to run the savegames).

Game 1
Game 2
Game 3
Game 4

For each game, I'll also provide the "best prediction", based on the 20 test results.
Now, by "best prediction", I merely intend the prediction that would yield the best average score across those 20 results. Which is not necessarily the "best" prediction you could make (discarding the obvious outliers would probably be a better strategy for instance), but well, that's what I mean here.
I'll also include:
- said average score
- the score it would have yielded for the actual game that Sullla ran ("actual")
- and the running totals for both.


(*) @Sullla: this is something you might want to double-check for future games. In several cases, an AI was missing a scout or an archer.
That's something I've noticed over the years: when you try and add units to a stack in the worldbuilder, sometimes the unit doesn't get created.

(**) The AIs seem to get an extra warrior for free at the start of the games (happens in the streamed games too), but it doesn't seem consistant (some AIs do, some don't). This was my attempt at fixing that (based on the hypothesis that it was related tp the human player's free warrior), but it had no effect. Well, it at least makes for one less click when starting the scenario. wink

(***) I originally intended to provide each game's initial start so it could be replayed... but I tested it and the replay diverged from the original after 120ish turns.
Since the replay is not reliable, I went instead for ease of use: with no persistance of the random seed, no need to fight barbarian units at the start of each game to get a new seed. Just launch the game, and get going.


Attached Files
.zip   Mod.zip (Size: 23.82 KB / Downloads: 0)
.xlsx   Survivor_s4_reruns.xlsx (Size: 255.43 KB / Downloads: 1)
Reply

Game 1 was a pretty open game, with four of the six AIs having a decent shot at winning.

   
   
(note : "A" column tracks the number of DoWs initiated by the AI, "D" the number of times the AI is declared upon, "K" the number of kills)

   

Cyrus was the top contender there : he has the best survival rate and the best win rate, with some very impressive and fast wins.
So you got unlucky there, Sullla : you definitely made the right call for favourite to win.
That said, with a 35% win rate only, Cyrus wasn't exactly an overwhelming favourite either.

Catherine was more of mixed bag : second best chance to win (right on Cyrus's heels), but also most likely to be first to die. This seems pretty easy to explain, though : her position favoured early clashes with Cyrus and Isabella, the two strongest AIs on this map. That was a recipe for disaster (hence her frequent early eliminations), but when she pulled through, that set her up for the win : if you beat the top dogs, that usually makes you the new top dog.

Isabella was also a strong performer here, though not as strong as her win in the competition game might suggest. She had lots of room to expand, and would often expand further through conquering Cathy (often getting help from Cyrus and/or Roosevelt in that conflict).
While geography explains in good part her successes (northern backline), it was also the root of her downfalls : she wasn't connected to the extensive river network in the south. Meaning that any religion founded in the central/southern part of the map (by Cyrus in particular) would spread like wildfire, while her own Christianity would remain in Spain (sometimes spreading to China, but that's it).
It could be a case where removing the Apostolic Palace changed things significantly in my tests : a low foreign spread would give her total control of the AP, allowing her to abuse it at will.
So the fact she still maintained a high performance level without the AP in play shows how strong her position was in that game.

Elizabeth was a "high contrast" performer there : she either won the game, or she got killed (in the only two instances where she survived without winning, she was on her way out in the first case, with less than 10 turns to live, and in the second case, she only lived because Isabella chose to murder Catherine whom she was friendly with, rather than her she was annoyed with).
Bad leader for the prediction contest then : not enough of a favourite to consider backing her, especially since the "reverse order" points are not in play, but with enough of a win rate to mess you up.
Cyrus was a major threat to her, while Qin provided a conquest option.
Isabella was a issue: the Spanish zealot tended to found and bury a lot of the late religions, thus keeping them out of English lands. The lack of religious diversity would critically slow a lot of her attempts at a cultural victory.

Roosevelt and Qin were the complete duds in this match. The only surprise being they wouldn't stand higher in the first to die category.
I think this is explained by the fact that early conflicts were most likely to break out between the major players first, Roosevelt and Qin being eliminated later, as an afterthought.
Now, don't read too much into that as to those leaders' general performance : their starting positions were just awful.
Roosevelt had extremely low quality land, no back line, and was next to Cathy : creative + imperialist !
Qin had much better land... but no backline either, and was tightly squeezed between three other leaders.


Best Prediction

Winner: Cyrus                                                    
Runner-up: Isabella                                                  
First-to-die: Catherine
Victory condition: Domination
Victory date: 345
Nb of wars:          10

Average expected score: 12.85
Actual: 7 (Competition best: 13)


Attached Files
.zip   Game 1.zip (Size: 1.86 MB / Downloads: 2)
Reply

This is awesome Wystan! Thanks for putting in the time and work.
Suffer Game Sicko
Dodo Tier Player
Reply

That's fascinating.

Like professional sports, as the better competitors are clearly superior but have no guarantees, this means that salt is supplied by a similar mechanism.  lol

We can describe someone like Elizabeth, or I think all the other pacifist high-peaceweight builders like Mansa Musa or Gandhi, as low-OBP/high-strikeout/high-SLG (American baseball analogy, sorry non-Americans): they "swing for the fences" and are thus prone to achieve the more extreme results of either winning outright or dying horribly.

You can also see some deterministic genetics, so to speak: a truly awful start (Roosevelt has one of the worst starting positions we've seen in five seasons) really does doom you, though not to the point where you can't still influence the game.
Reply

Very interesting stuff, Wyatan. Thanks for running this and sharing the results! thumbsup
Reply

Ya I need so much more of this, love it. 

I would say the Civ4 AI is bizarrely fascinating except that we all know why it is: It's one the best 4X AIs even to this date. As Sullla loves to point out, the noble AI can beat almost all casual players.
Reply

This is incredible work Wyatan! goodjob I'm going to take this information and add it to the writeup for Season Four Game One. I thought at the time that Isabella's victory was an unlikely result and it's reassuring to see that was the case. (Although not by as much as I though - she did win 3/20 times which is far better than Qin or Roosevelt managed.)

For savegame files, I would selfishly love to see this kind of analysis for all of the Season Four games *AND* the Season Five games. But obviously that's a ton of work and I understand that there's more interest in this analysis for the more recent Season Five games. My preference is to post all the savegame files together at the end of the season but if you want to start going through some of the Season Five games, I can zip a few of them together and send your way. Do you have a particular game that you wanted to look at next?
Follow Sullla: Website | YouTube | Livestream | Twitter | Discord
Reply

(August 1st, 2020, 08:43)Sullla Wrote: I thought at the time that Isabella's victory was an unlikely result and it's reassuring to see that was the case. (Although not by as much as I though - she did win 3/20 times which is far better than Qin or Roosevelt managed.)

We could even count the first game as a victory for her since she hit that bug where the last city of her current opponent is locked behind the borders of someone who won't open borders with her:
- Can't launch an ampibious assault because same landmass
- Can't make peace because of AI programing
- Can't declare on the AI blocking access because already at war
She had a crushing domination win underway up to that point.
But anyway, 3-4 wins (15-20%) is the baseline for a "standard" chance to win (20/6 = 16,6%).
Her higher than average runner-up finishes put her firmly in the "performing well" category, though.
So her win, while not the most likely, wasn't shockingly lucky either.

Roosevelt's second place finish was the real shocker apparently, though.

(August 1st, 2020, 08:43)Sullla Wrote: For savegame files, I would selfishly love to see this kind of analysis for all of the Season Four games *AND* the Season Five games.

Rest assured, that's the plan anyway. smile
Unless "real life" interferes, obviously.

(August 1st, 2020, 08:43)Sullla Wrote: But obviously that's a ton of work and I understand that there's more interest in this analysis for the more recent Season Five games. My preference is to post all the savegame files together at the end of the season but if you want to start going through some of the Season Five games, I can zip a few of them together and send your way. Do you have a particular game that you wanted to look at next?

Thanks. smile
But now that I've started, I think I'll keep things simple and just do the games in sequence.
So I'll finish season 4 before tackling season 5.
It would be hard to choose anyway, each game has its set of questions to answer:
Was Mansa really doomed ?
Did Gandhi actually get lucky ?
etc.
Reply

Very interesting analysis! I would say one definite thing from this game is the AP being gone resulting in fewer wars, which I think everyone has kinda noticed, but nonetheless it is noteworthy that absolutely no non-AP game reached the AP level 14 wars and that 3/4ths of the sample games had 3 or more wars lower than the actual amount. Quick calculator math tells us that the average number of wars across this 20 game sample was 10.3, or a full 4 wars less than the actual game. I'm definitely going to keep track of how the # of wars change without the AP compared to with it.

I don't want to ask too much, but I will say if you ever do ANY pre Season 4/5 game there is one game I have really wanted to see Alternate Historied for a long time: Season Three, Game Seven. This was one of Charlemagne's dominating performances, losing only to a very fluky Wang Kon spaceship due to launching with one engine. It is the only opening round game Huanya Capac has ever been eliminated him, with him launching a bizarre cross-map war against Wang Kon. Tokugawa played a fine game (and I wonder what'd have happened if he ate Bismarck faster), while Mehmed ate Bismarck almost entirely and got out to being a big player despite a fluky Settler loss early and could have run away with the game were it not for Charlemagne's timely war declaration. Similarly, Wang Kon built up well, took a city from Huanya Capac and perhaps could have taken much more of his core were it not for Mehmed. And while Bismarck played a bad game he DID have an awesome capital that game.

Because of that I'm super interested to see if:

1. This is a genuinely bad map for Huanya (I actually predicted Wang Kon this game, but I missed Charlemagne and was baited by Bismarck's great start), or did we see the fluky result and he really WAS the correct play as we'd think from his AI score?

2. Regardless of Huanya, how do the AI shake up in terms of consistency? This was a map where, to me, it felt like anyone could come out on top. Time has suggested Charlemagne is a solid AI and he had a great position for his gameplan. Tokugawa feels lowkay underrated and has a good chance to eat Bismarck each game and so on. Is this a rare game where almost every AI had a realistic shot? Is Charlemagne actually the "right" pick? Does Bismarck actually expand ever and make use of his insane Corn + double riverside Gems start? And so on.

3. It also was a map where essentially EVERYONE had a good and lush start, which would make it interesting since as you said these are pretty map-specific, so putting one with a map where everyone has good locations might allow more conclusions.
Reply

I incorporated Wyatan's analysis into the writeup for Season Four Game One: http://www.sullla.com/Civ4/civ4survivor4-1.html

Looking forward to seeing the results from the next game! thumbsup
Follow Sullla: Website | YouTube | Livestream | Twitter | Discord
Reply



Forum Jump: