When two programs play a series of chess games it has been
known for people to draw conclusions that are not statistically
significant. For example, if computer A beats computer B by
14 games to 5, can we say, for certain, that A is better than
B? The answer is that at a significance of 95% we are still
unsure who is the best. To illustrate this point I have created
a small program that calculates the required margin of victory
for statistical significance. It is based upon the binomial
distribution. It requires Windows 95/98/NT/2000/XP.
To use the program enter the losers score - in this example the loser scored two.
The program then shows how many the winner would have to score to be statistically
better than the loser. You can change the level of significance.
Another way if looking at it is if you played 11 games and the score was three v eight, you
cannot be 95% certain that the winner would win a series of 1000 games.