It's possible for the same group of (beta/QA) testers to produce drastically different results based on what their PM prioritizes, how much time they're given, how stable the rule-set is, etc. The people doing the testing, like everyone else, are also constrained by their inputs. That doesn't mean that differences in the tester-pool don't make a difference, it just means it's impossible from the outside-not-looking-in to distinguish between bad testing, bad process, and different objectives.
The fact that the benchmark suite exists and ships with the game could help explain how the thing is so stable (few CTDs, though they do exist, the only one I've seen is CTD after choosing to exit to desktop). If looks like it does a little bit of everything, rolls a few turns, etc. you can run that against a suite of machines and see if any of them crash out and harvest logs. This also implies that most of the work necessary for an automatic framework has already been done, which if I find myself doing a TC I would want to shamelessly steal
If the "quality" priority was more "does not crash" and less "accurately implements the rule-set in Ed Beach's head, with all corner cases including the ones he may have forgotten to write down", a perfectly good team of beta-testers, developers, and designers could still produce the game that we saw on launch day.
The fact that the benchmark suite exists and ships with the game could help explain how the thing is so stable (few CTDs, though they do exist, the only one I've seen is CTD after choosing to exit to desktop). If looks like it does a little bit of everything, rolls a few turns, etc. you can run that against a suite of machines and see if any of them crash out and harvest logs. This also implies that most of the work necessary for an automatic framework has already been done, which if I find myself doing a TC I would want to shamelessly steal

If the "quality" priority was more "does not crash" and less "accurately implements the rule-set in Ed Beach's head, with all corner cases including the ones he may have forgotten to write down", a perfectly good team of beta-testers, developers, and designers could still produce the game that we saw on launch day.