756 lines
34 KiB
Plaintext
756 lines
34 KiB
Plaintext
|
Newsgroups: rec.games.frp.archives
|
||
|
From: barnett@agsm.unsw.oz.au (Glen Barnett)
|
||
|
Subject: PAPER: Testing Dice for bias
|
||
|
Message-ID: <al#23np@rpi.edu>
|
||
|
Organization: The Australian Graduate School of Management
|
||
|
Date: Wed, 2 Dec 1992 04:41:08 GMT
|
||
|
Approved: goldm@rpi.edu
|
||
|
Lines: 749
|
||
|
|
||
|
The following information is intended for distribution over Internet,
|
||
|
and outside of that may be copied for personal use only.
|
||
|
(c) Glen L. Barnett, 1992. All rights reserved.
|
||
|
|
||
|
I recently responded to a thread on rec.games.frp.dnd about testing dice,
|
||
|
in order to fix up a few misconceptions and make some suggestions. On the
|
||
|
suggestion of Coyt Watters I'm putting this on r.g.f.archives, which I think
|
||
|
is a good idea. What follows has some major additions, however.
|
||
|
|
||
|
In this post, I will begin by discussing various posts on the subject
|
||
|
of testing dice, then show how to do the test under discussion in the
|
||
|
thread (the chi-squared goodness-of-fit test) properly, and then talk
|
||
|
about more appropriate tests.
|
||
|
|
||
|
---------------------------------------------------------------------
|
||
|
|
||
|
In article <BxHJLy.Jq0@watserv1.uwaterloo.ca>,
|
||
|
alongley@cape.UWaterloo.ca (Allan Longley) writes:
|
||
|
[..stuff deleted..]
|
||
|
>testing dice for bias. Well, here is a test to use. I haven't actually
|
||
|
>tested this yet, but it should -- in theory -- work. And no, this is not
|
||
|
>a copy out of the old Dragon magazine, but it is the same test -- its a
|
||
|
>pretty standard test. I will use simple terms -- so all you math/stat
|
||
|
>people out there, don't correct the fine points, I know.
|
||
|
[description of chi-squared goodness-of-fit test deleted]
|
||
|
|
||
|
Since Allan asks for no correction of fine points, I will attempt to
|
||
|
limit myself to major problems. This is not intended as a flame on
|
||
|
Allan, but this is fairly important stuff, and should be explained
|
||
|
correctly. If at any stage I get less than pleasant, please accept
|
||
|
my apology in advance.
|
||
|
|
||
|
While the calculations that Allan describes give the correct value
|
||
|
of a chi-square goodness-of-fit statistic (which he calls "Indicator"),
|
||
|
you should be *very* wary of interpreting the results in the way
|
||
|
he describes, as I will explain:
|
||
|
|
||
|
Let us assume you have 40 dice that (unknown to you) are all perfectly
|
||
|
fair, and you wish to test all of them, to see if any are "biased".
|
||
|
|
||
|
The way Allan has set his test up, you'd expect 2 of them to give
|
||
|
results below "Probably Fair", which he says indicates the die is
|
||
|
probably unbiased. That is, you have 40 fair dice, and you will
|
||
|
expect to regard only *two* of them as probably O.K.! Similarly,
|
||
|
you will expect to consider two of your "purely fair" dice as
|
||
|
probably unfair. Of the remaining 36, you will expect 18 scores
|
||
|
between "Probably Fair" and "Maybe" and 18 more between "Maybe"
|
||
|
and "Probably biased". For these 36, you have to do the test again,
|
||
|
under Allan's scheme. If you get both results below "Maybe" (you expect
|
||
|
9 of these) you say "Probably Fair". Similarly you expect 9 above
|
||
|
"Maybe" on both trials.
|
||
|
|
||
|
|
||
|
So we have (after repeating the test for 90% of the dice):
|
||
|
|
||
|
Number of Fair Dice: 40
|
||
|
Expected number "probably fair": 11
|
||
|
Expected number "probably biased": 11
|
||
|
Expected number which we don't know about: 18.
|
||
|
|
||
|
So over a quarter of perfectly fair dice will be called "probably biased".
|
||
|
If we continue testing those remaining 18 we are still undecided about,
|
||
|
the problem gets worse.
|
||
|
|
||
|
|
||
|
Other problems:
|
||
|
|
||
|
Allan says:
|
||
|
|
||
|
"The column titled "Maybe" are the Indicator values where there is
|
||
|
a 50% chance that the die is fair and a 50% chance that the die is
|
||
|
biased." (A)
|
||
|
|
||
|
This is just plain wrong.
|
||
|
|
||
|
The column he refers to is the value that a test on a *fair* die will
|
||
|
exceed 50% of the time. This is very different, and probably explains
|
||
|
why Allan misunderstands the whole interpretation of the results. (B)
|
||
|
|
||
|
If any of you can't see why what I said (B), and what Allan said (A) are
|
||
|
totally different, don't despair. This stuff is not always obvious from
|
||
|
the start. If you can follow the rules of an average RPG, you are smart
|
||
|
enough to understand a few non-trivial statistical ideas. I'm quite happy
|
||
|
to provide further clarification to the net if the demand is there.
|
||
|
|
||
|
> >From Table 1, it appears that the d4 tested may be "Fair" but another test
|
||
|
> should be done.
|
||
|
|
||
|
I'd say not. A reasonable interpretation of the result is "There is
|
||
|
no reason to doubt that the die is O.K.".
|
||
|
|
||
|
Incidentally, the test statistic of 2.00 obtained in the example is
|
||
|
only 2/3 of what you'd expect with a fair die. The value of 2.00 will
|
||
|
be exceeded almost 60% of the time by a test on a fair die.
|
||
|
|
||
|
|
||
|
---------------------------------------------------------------------
|
||
|
|
||
|
In article <Bxu26A.3F6@watserv1.uwaterloo.ca>,
|
||
|
alongley@cape.UWaterloo.ca (Allan Longley), in response
|
||
|
to Michael Wright, says:
|
||
|
|
||
|
|In article <wright.721879981@latcs1.lat.oz.au|wright@latcs2.lat.oz.au
|
||
|
| (Michael G. Wright) writes:
|
||
|
|>dks@acpub.duke.edu writes:
|
||
|
|>
|
||
|
|>> This is called a chi-square test, and an article with the
|
||
|
|>>procedure and numbers for it appeared way back in Dragon issue
|
||
|
|>>#74... Thank you, Mr. Longley, for reposting it (or did you come
|
||
|
|>>up with it in isolation? =) ) for the benefit of those who don't
|
||
|
|>>have the issue (probably most readers).
|
||
|
|
|
||
|
|Yes, I seen the issue. The chi-square test is a standard statistical test
|
||
|
|for determinig if a data set matches a particular distribution -- so, no, I
|
||
|
|did not come up with the test in isolation, its been around for a lot longer
|
||
|
|than D&D. I don't actually have the issue, so I didn't copy it for the net.
|
||
|
|
|
||
|
|I've been playing with the chi-square test and you know what I found out --
|
||
|
|ALL DICE ARE BIASED!! Well, that's not true -- all except d4's and d6's are
|
||
|
|biased. Of course, this really shouldn't be a surprise. So, I've been
|
||
|
|looking at modifiying the chi-square test for "real world" dice -- more on
|
||
|
|this in a later post.
|
||
|
|
||
|
Allan is correct that the test has been around a lot longer than D&D.
|
||
|
The test, due to Karl Pearson, is nearly 100 years old.
|
||
|
|
||
|
Its no surprise that Allan finds that all real dice are biased:
|
||
|
i) Its impossible to make a truly fair die (obviously). Its just
|
||
|
that most are close enough that we don't care too much. The
|
||
|
chance of getting "close to fair" will decrease with the number
|
||
|
of sides.
|
||
|
|
||
|
ii) Allan's testing method will call more than a quarter of fair
|
||
|
dice (assuming they existed) biased. Even the fairest dice you
|
||
|
could buy have a good chance of being called biased.
|
||
|
|
||
|
|
||
|
|>Actually, I use a program I made to roll dice for stats. Unfortunately, nobody
|
||
|
|>in the party wants to use it, because dice rolls invariably end up better. I
|
||
|
|>think this must be because of the pseudo-randomness of the program.
|
||
|
|>Anyone out there that knows better?
|
||
|
|
|
||
|
|I wouldn't want to use a computer generated die-value while playing AD&D.
|
||
|
|THe thrill of the "rolling die" is part of the game. Also, with reference
|
||
|
|to the above, most players will have a favourite die/dice due to the
|
||
|
|inherent bias found in real dice. The trick is to find the dice that are
|
||
|
|biased beyond reasonable playability.
|
||
|
|
||
|
In response to Michael G Wright:
|
||
|
|
||
|
Michael's discovery that hand-rolled dice often come out better
|
||
|
could occur for a couple of reasons:
|
||
|
|
||
|
i) Players tend to hang on to "favourite" dice that "roll well"
|
||
|
(i.e. come up with good results). So biased dice have some
|
||
|
chance of concentrating into the hands of players. As long as
|
||
|
this isn't too extreme, it probably doesn't matter too much.
|
||
|
(Allan quite correctly identifies this reason).
|
||
|
|
||
|
ii) Players don't really roll randomly. I had a fairly long email
|
||
|
discussion with Sea Wasp on this topic just recently. Even
|
||
|
unconciously, you can pick up the dice in a "non-random" fashion,
|
||
|
so that a good roll will tend to be followed by another good
|
||
|
roll if you don't roll tooo vigorously. You may notice that
|
||
|
after a bad roll players tend to throw harder. This may be more
|
||
|
of a problem, as some players are *much* better at it than others.
|
||
|
If it becomes too noticeable, you may wish to invest in a dice
|
||
|
cup, or mock up a craps-table affair.
|
||
|
|
||
|
Allan's comment that "The trick is to find the dice that are biased
|
||
|
beyond reasonable playability" is spot on. Exactly correct.
|
||
|
Remember it, because I'll come back to it later.
|
||
|
|
||
|
|
||
|
I think the above discussion also answers Paul Kinsler's questions.
|
||
|
|
||
|
[in article <BxvuzM.CsI@bunyip.cc.uq.oz.au>, kinsler@physics.uq.oz.au
|
||
|
(Paul Kinsler) asked for clarification of Allan's comment that all
|
||
|
dice are biased].
|
||
|
|
||
|
---------------------------------------------------------------------
|
||
|
|
||
|
In article <1992Nov12.194015.1602@Princeton.EDU>,
|
||
|
dagolden@phoenix.Princeton.EDU (David Alexandre Golden) writes:
|
||
|
|
||
|
[stuff deleted]
|
||
|
>I once did something along those lines to test whether my DM's die was
|
||
|
>fair. (It would roll 20's a lot. Often on command.)
|
||
|
>
|
||
|
>To test the die, I made a histogram (I believe is the term for it) like this:
|
||
|
>
|
||
|
>x x x x x
|
||
|
>x x x x x x x x x x x x x x x x x x x x
|
||
|
>x x x x x x x x x x x x x x x x x x x x
|
||
|
>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
|
||
|
>
|
||
|
>With an "x" being each time that number came up. A totally fair die would
|
||
|
>have a straight line across, assuming it was rolled enough times. The
|
||
|
>question is, what is enough times? I rolled a d20 about 380 times (not bad
|
||
|
>if one person rolls and the other makes an x in the column) and while it
|
||
|
>gave a fairly good idea that the die was biased, the biased numbers had only
|
||
|
>occured a couple times more than the average. (If I remember correctly).
|
||
|
>So I wrote a computer program to do the same thing until the deviation between
|
||
|
>the highest and lowest number of occurances was less than about 15% of the
|
||
|
>average number of occurances. (i.e a reasonably smooth profile). The
|
||
|
>computer required SEVERAL THOUSAND ROLLS to do this.
|
||
|
|
||
|
Your idea of making a histogram is a good one. In fact all the
|
||
|
chi-squared test does is the same as drawing a line across the
|
||
|
histogram where your expected "straight line across" would go,
|
||
|
looking at the deviations from that, squaring (to get all positives)
|
||
|
and adding the squared deviations up and dividing by that expected
|
||
|
number. This gives a single overall measure of deviation from uniformity.
|
||
|
The advantage of looking at the histogram is you see where
|
||
|
the the differences are, but you can't tell how big they "ought" to be
|
||
|
for a fair die. The actual number in the cell will be approximately normally
|
||
|
distributed with mean equal to the expected number in each cell and standard
|
||
|
deviation approximately the square root of the mean. In the above example,
|
||
|
we'd expect Dave to get 19 in each cell, so the standard deviation is about
|
||
|
4.35. That is, we'd expect to get about 2/3 of the cells with counts in
|
||
|
the range 15 to 23, and about a 2/3 chance of all but 1 or 2 of the values
|
||
|
inside the range 11 to 27.
|
||
|
|
||
|
[ some of my own discussion deleted - see the section "Tests based
|
||
|
on the histogram" below]
|
||
|
|
||
|
> ... Still, the point is that I'm skeptical
|
||
|
>that the "fairness" of a die can be determined in only a hundred or so rolls.
|
||
|
>(d4 maybe... d20 no way!)
|
||
|
|
||
|
Well, in fact Allan's suggestion was to use 20 rolls per cell, so he'd use
|
||
|
400 rolls for a d20 and 80 rolls for a d4. But in any case you can never
|
||
|
decide that a die is actually fair. If you do a test and get a result
|
||
|
close to what you expected if the die was fair, you have a lack of evidence
|
||
|
against the hypothesis of fairness (which is the default assumption for
|
||
|
a statistical test of biasedness in a die - the "null hypothesis").
|
||
|
|
||
|
What you get is either a higher degree of evidence against the hypothesis
|
||
|
of fairness (by getting a result that is very unlikely with a fair die),
|
||
|
or a low degree of evidence against fairness. It's like in a court case,
|
||
|
(a criminal case), where the defendant is assumed innocent until proven
|
||
|
guilty (innocence is the null hypothesis), but evidence against the
|
||
|
defendant is presented by the prosecution. The jury then decides either
|
||
|
"guilty" if there is strong enough evidence, or "Not guilty" if there
|
||
|
is not. They don't declare innocence.
|
||
|
|
||
|
So we can't determine "fairness" anyway. The question we need to ask
|
||
|
is: If the die is biased, will a hundred rolls (or whatever number)
|
||
|
be enough for us to have a good chance to pick up that difference,
|
||
|
while at the same time, not "convicting the innocent" too often?
|
||
|
Whether it is enough depends on how big a difference you think it is
|
||
|
important to pick up.
|
||
|
|
||
|
-----------------------------------------------------------------------
|
||
|
|
||
|
|
||
|
In article <1992Nov12.231019.1204@mcs.kent.edu>,
|
||
|
adray@mcs.kent.edu (Adam Dray) writes (in response to Dave Golden):
|
||
|
|
||
|
>In other words, a histogram shows very little. Random doesn't mean
|
||
|
>necessarily that you'll get an even distribution. It just mean the
|
||
|
>probability that you won't get an even distribution is proportional to
|
||
|
>the number of sides on the die, and the number of times you roll it.
|
||
|
|
||
|
I disgree with the first sentence. The final sentence above is wrong.
|
||
|
The more you roll it, the more even the distribution will be, as long
|
||
|
as the die is fair. It doesn't really depend on the number of sides.
|
||
|
(except as far as the negative dependence between cell counts is
|
||
|
reduced for more sides).
|
||
|
|
||
|
>Notes about the fairness of dice:
|
||
|
>
|
||
|
>Sharp-edged dice are better than smooth-edged dice. They're also more
|
||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
Not always, but this may be true more often than not.
|
||
|
|
||
|
>expensive, however. Rounded dice are often inked by coating the
|
||
|
>entire die with ink, then tossing the die in a "tumbler" (similar to
|
||
|
>tumblers for smoothing rocks) until all the die on the outside is
|
||
|
>gone. Thus, the ink is left in the crevices where the numbers are.
|
||
|
>
|
||
|
>Theoretically, the grooves for the numbers can make one side a more
|
||
|
>likely outcome. Official casino dice don't have inset pips.
|
||
|
|
||
|
If you do it right any effect will be swamped by other manufacturing
|
||
|
defects anyway.
|
||
|
|
||
|
[some stuff deleted]
|
||
|
>
|
||
|
>GameScience did tests on other manufacturers' dice. They found
|
||
|
>certain numbers to be more likely. I've heard that the real 100-sided
|
||
|
>die tends to roll certain numbers more often.
|
||
|
|
||
|
It is impossible (both effectively and theoretically) to get a fair 100-
|
||
|
sided die. The practical problems are more important than the theoretical
|
||
|
ones.
|
||
|
|
||
|
>Filing corners off your dice can make certain outcomes more probable.
|
||
|
>Natural wear can do the same thing.
|
||
|
>
|
||
|
>For most people, none of this matters one damn bit. =)
|
||
|
|
||
|
In general, no, it doesn't matter. It's encouraging to see that so
|
||
|
many people (just about all posters on the topic) realise this.
|
||
|
|
||
|
---------------------------------------------------------------------------
|
||
|
|
||
|
Interpreting the test statistic (Allan's "Indicator")
|
||
|
|
||
|
Carry out the calculations as described by Allan*, but use any number of
|
||
|
throws per cell (possible outcome) you like (I'd suggest 10 as a minimum,
|
||
|
because otherwise the tabled distribution is out a bit). The more rolls
|
||
|
you do, the better chance you have of picking up a difference of a given
|
||
|
size. The value of 20 that Allan suggested may well be a reasonable choice
|
||
|
in most circumstances. Allan gives the calculations for two different
|
||
|
numbers of throws (20 and 10 per cell, but in different posts), so you
|
||
|
ought to be able to generalise.
|
||
|
|
||
|
* The calculations given by Allan may no longer be available to you, so
|
||
|
an indication of how to do the calculations is given here:
|
||
|
|
||
|
Roll the die many times, say 20 times per face. Record each result
|
||
|
(I suggest you make up a tally sheet). Calculate the difference between
|
||
|
the number of times each face came up and the expected number (20 in this
|
||
|
case). Square these values and add them. Divide by the expected number
|
||
|
per face. This is your chi-squared statistic.
|
||
|
E.g. d4: Roll 20 times per face = 80 rolls
|
||
|
Face: 1 2 3 4
|
||
|
No times 23 18 15 24
|
||
|
expected 20 20 20 20
|
||
|
difference 3 2 5 4
|
||
|
diff^2 9 4 25 16 Sum = 54, chi-squared value = 54/20 = 2.7
|
||
|
|
||
|
|
||
|
If the result is less than the final column of Allan's table 2 (which
|
||
|
are the tabulated values for a 5% significance level), you shouldn't
|
||
|
worry too much, there is not very strong evidence of bias - in fact 1
|
||
|
in 20 tests on a fair die will score worse than this. If the result is
|
||
|
much bigger than the value you have some cause for concern. A result
|
||
|
bigger than the 1% column below is quite unusual if the die is fair
|
||
|
(a result at least this big only occuring 1% of the time), so it gives
|
||
|
us good reason to suspect bias.
|
||
|
|
||
|
A small table of the chi-squared distribution:
|
||
|
|
||
|
5% 1% df
|
||
|
d4 7.81 11.34 3
|
||
|
d6 11.07 15.09 5
|
||
|
d8 14.07 18.48 7
|
||
|
d10 16.92 21.67 9
|
||
|
d12 19.68 24.72 11
|
||
|
d20 30.14 36.19 19
|
||
|
|
||
|
(these results came from a computer approximation to the chi-squared
|
||
|
distribution. They should be accurate to the figures given.)
|
||
|
|
||
|
If you want a more "cookbook" approach; if the result exceeds the 1%
|
||
|
value, its probably biased. If its between the 1% and 5% values, there
|
||
|
is a moderate degree of evidence that its biased, but it still might be
|
||
|
OK. If its less than the 5% value, you don't have any reason to think
|
||
|
its biased on the basis of the test.
|
||
|
|
||
|
|
||
|
You will find more extensive tables in most elementary statistics books.
|
||
|
(references for the chi-squared and Kolmogorov-Smirnov tests are
|
||
|
at the end of this article).
|
||
|
You look up the df (degress-of-freedom) that are one less than the
|
||
|
number of faces on the die (e.g. d4 -> 3 df).
|
||
|
|
||
|
A note on pronunciation: The Greek letter chi (the capital looks like
|
||
|
an X, and the lower-case has one of the two crossed lines a bit curly)
|
||
|
is pronounced with a hard "ch" like Charisma, and the word rhymes with
|
||
|
pie. Note that mathematical symbols come from *ancient* Greek, so no
|
||
|
arguments from any modern Greeks please.
|
||
|
|
||
|
This will provide a reasonable all-round test for bias in a die.
|
||
|
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
|
||
|
Why you probably don't want to do the chi-squared test:
|
||
|
(at least for d8 and above)
|
||
|
|
||
|
The chi-squared test will pick up any kind of deviation from a purely even
|
||
|
distribution. However, we are much more worried about some kind of deviations
|
||
|
than others. For example, I'd be more interested in knowing that "20" came
|
||
|
up too often on a d20 than knowing "10" came up too often. The first could
|
||
|
affect play substantially, the second probably only a little. We should use a
|
||
|
test with a better chance to pick up the kind of deviations from fairness
|
||
|
that are most important to us (which will trade off with less chance of
|
||
|
picking up deviations we are less concerned with).
|
||
|
|
||
|
Let us consider a more complete example:
|
||
|
|
||
|
Imagine we have two d20's we'd like to test, and that in fact
|
||
|
(but unknown to us) they have the following (percentage) probabilities:
|
||
|
|
||
|
(the rows are: Face number
|
||
|
% prob 1st die
|
||
|
% prob 2nd die )
|
||
|
|
||
|
|
||
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
|
||
|
5 4.5 6 3.5 7 2.5 8 1.5 7 5 5 7 1.5 8 2.5 7 3.5 6 4.5 5
|
||
|
1.5 1.5 2.5 2.5 3.5 3.5 4.5 4.5 5 5 5 5 6 6 7 7 7 7 8 8
|
||
|
|
||
|
A fair die would have 5% right across, of course. These 2 dice can be obtained
|
||
|
from each other by relabelling the faces. The first die will be reasonable in
|
||
|
play, because, of course, we don't try to roll 'exactly 8' or 'exactly 9',
|
||
|
but 'less than 8' or 'greater than 11'. The first die is never out by
|
||
|
more than one twentieth of the required probablility (e.g. probability of
|
||
|
a 2 or less is 9.5% instead of 10%) in either direction. It has the correct
|
||
|
average, and almost the correct standard deviation (the difference is tiny).
|
||
|
|
||
|
The second die would be very unbalancing in play: it has about a 2/3 chance
|
||
|
(66%) of rolling 11 or higher, and a 20 is more than 6 times as likely as
|
||
|
a 1. The mean is almost 13. The standard deviation is also out, but that's
|
||
|
relatively unimportant.
|
||
|
|
||
|
The chi-squared will rate them as equally bad!
|
||
|
|
||
|
So a good test should be likely to identify the second die, but we might
|
||
|
be prepared to sacrifice some of our ability to pick up the first, since
|
||
|
it will make little practical difference in play. (I said I'd come back to
|
||
|
this point!)
|
||
|
|
||
|
Note that almost any deviation on a d4 will be important (there are only 4
|
||
|
different values), and to a lesser extent a d6. I'd stick with the chi-
|
||
|
squared test on those.
|
||
|
|
||
|
There are many tests that will do what we want. I will present only one
|
||
|
such test*. (This is not to say that a properly applied chi-squared is
|
||
|
not good, just that a test more closely tailored to our specific
|
||
|
question of interest will be even better.)
|
||
|
|
||
|
* two tests if you count "Tests based on histograms", below.
|
||
|
|
||
|
The Kolmogorov-Smirnov test:
|
||
|
|
||
|
Collect data as for the chi-squared test, up to the point where you
|
||
|
start doing calculations.
|
||
|
|
||
|
That is, lay out like this (you could run down rather than across):
|
||
|
|
||
|
|
||
|
Roll: 1 2 3 4 5 ......
|
||
|
Count: 17 19 22 27 24 ......
|
||
|
Expected: 20 20 20 20 20 ......
|
||
|
|
||
|
Now add up your counts and expected counts, writing the partial
|
||
|
totals as you go:
|
||
|
|
||
|
Roll: 1 2 3 4 5 ......
|
||
|
Count: 17 19 22 27 24 ......
|
||
|
Expected: 20 20 20 20 20 ......
|
||
|
Sum Count: 17 36 58 85 109 ......
|
||
|
Sum Exp: 20 40 60 80 100 ......
|
||
|
|
||
|
Now find the differences (without sign):
|
||
|
|
||
|
Sum Count: 17 36 58 85 109 ......
|
||
|
Sum Exp: 20 40 60 80 100 ......
|
||
|
Difference: 3 4 2 5 9 ......
|
||
|
|
||
|
The last difference will be zero, so you don't have to work out the
|
||
|
final column (I still would as a check).
|
||
|
|
||
|
Divide the largest difference (9 is the largest difference above, for
|
||
|
the calculations you can see) by the number of rolls you made altogether.
|
||
|
|
||
|
This is your test statistic. Let's call the value D.
|
||
|
|
||
|
You can look it up in most books on nonparametrics, which will
|
||
|
have tables. However, you would be better to use the table below, for
|
||
|
reasons I'll discuss in a second.
|
||
|
|
||
|
You multiply D by the square root of the number of rolls
|
||
|
(equivalently, divide the largest difference by the square root of
|
||
|
the number of rolls), and compare with:
|
||
|
|
||
|
5% 1% d#
|
||
|
1.08 1.35 4
|
||
|
1.10 1.37 6 These values apply pretty well irrespective of
|
||
|
1.11 1.38 8 the total number of rolls, but I would use at
|
||
|
1.12 1.39 10 least 10 rolls per face.
|
||
|
1.12 1.40 12 Note also that these values come from simulation,
|
||
|
1.14 1.42 20 and are hence not exact. This doesn't really matter.
|
||
|
|
||
|
and interpret as I suggested for the chi-square test.
|
||
|
|
||
|
You may find the following values in tables:
|
||
|
|
||
|
5% 1%
|
||
|
1.36 1.63 (irrespective of the number of sides on the die)
|
||
|
|
||
|
the reason these are larger is that they are based on the assumption that
|
||
|
the distribution the data are from is continuous (effectively, a *very*
|
||
|
large number of faces on the die would give these values). If you use the
|
||
|
textbook values, the test will be conservative (a fair die will reject
|
||
|
slightly less often than the supposed 5% and 1% for the above table), due
|
||
|
to the distribution of values being discrete (d20 generates only integers,
|
||
|
not anything between).
|
||
|
|
||
|
So, for our above example, assume there are no larger differences
|
||
|
than 9, and that we made 400 rolls on a d20 (hence the expected number
|
||
|
in each cell is 20, as above). Then D is 9/400 = .0225, which if you
|
||
|
can get tables you'd look up. We made 400 rolls, so we could use the
|
||
|
table above: the square root of 400 is 20, so D x 20 (= 9/20) = .45.
|
||
|
This is much less than the 5% value, so there is little evidence that
|
||
|
the die is unfair.
|
||
|
|
||
|
There are tests which are probably even more appropriate, but these
|
||
|
two (chi-squared and K-S) will be enough for you to get a good idea
|
||
|
of any suspect dice.
|
||
|
|
||
|
Note: If you suspect a die, and decide to test it, don't use the
|
||
|
rolls that made you suspect it in the test. Generate a new
|
||
|
set. e.g. if you are all recording your rolls as you play,
|
||
|
and one players' results look funny, don't then test those
|
||
|
recorded values - you have to generate a new set.
|
||
|
|
||
|
-------------------------------------------------------------------------
|
||
|
|
||
|
Testing a die based on the histogram of rolls:
|
||
|
|
||
|
The histogram approach can be turned into a test of sorts as follows:
|
||
|
After drawing the histogram, 2 lines can be drawn either side of the
|
||
|
expected (mean) result. If all histogram bars lie within the inner
|
||
|
lines, there is no strong evidence of bias. If any of the bars go outside
|
||
|
the outer lines, there is fairly clear evidence of bias. If one of the
|
||
|
bars lies between the inner and outer lines, then there is some (mild)
|
||
|
evidence of bias, but its is not really clear. You may wish to then
|
||
|
perform a further test on the probability for that individual side,
|
||
|
as described in the next section (Testing an individual face). If
|
||
|
several of the bars lie between the inner and outer lines, we have a
|
||
|
stronger indication of bias.
|
||
|
|
||
|
Where do we draw the lines?
|
||
|
|
||
|
I have worked out values for rolling 20 times for each face (as in the
|
||
|
other examples, 80 times for a d4, 400 times for a d20). The bars on the
|
||
|
histogram must actually go past these values. You could think of these
|
||
|
values as giving "Acceptable Ranges" (literally, 95% and 99% acceptance
|
||
|
regions) for the histogram. A fair die will give histograms with one or
|
||
|
more bars outside these ranges 5% and 1% of the time respectively
|
||
|
(actually just under).
|
||
|
|
||
|
|
||
|
Table for 20
|
||
|
throws per face Approximate formula:
|
||
|
|
||
|
d# 5% 1% Let N be the total number of rolls.
|
||
|
4 8-34 6-37 Let c be the number of faces on the die.
|
||
|
6 9-33 7-36 Let e be the expected number of times
|
||
|
8 9-33 8-35 each face will come up ( e = N/c ).
|
||
|
10 10-32 8-35 Then the lines go at e +/- A x sqrt[e (c-1)/c],
|
||
|
20 11-30 9-32 and the value for 'A' comes from the table below.
|
||
|
All fractions should be rounded up.
|
||
|
e.g. N=160, c=8, e=20 give:
|
||
|
5%: 20 +/- 2.73 sqrt (20 x 7/8) or 8-32
|
||
|
1%: 20 +/- 3.22 sqrt (20 x 7/8) or 7-34
|
||
|
|
||
|
Table to go with
|
||
|
approximate formula:
|
||
|
|
||
|
d# 5% 1%
|
||
|
4 2.49 3.02 Due to being in the
|
||
|
6 2.63 3.14 extreme tails of the
|
||
|
8 2.73 3.22 distribution, combined
|
||
|
10 2.80 3.29 with slight asymmetry,
|
||
|
12 2.86 3.34 the ranges we get are
|
||
|
20 3.02 3.48 sometimes out a bit.
|
||
|
This is not a big deal.
|
||
|
|
||
|
|
||
|
Example: We throw a d20 400 times, and record the results
|
||
|
and from the table above, we draw the inner lines at 11 & 30,
|
||
|
and the outer lines at 9 & 32, as well as a reference line at 20.
|
||
|
(in the histogram below, "." =1 count, ":" =2 counts. The horizontal
|
||
|
lines aren't quite in the correct positions; they ought to be about
|
||
|
a quarter to half a character position lower.)
|
||
|
|
||
|
Counts (34)
|
||
|
35 + > <
|
||
|
|__________________________:__________________________________ (32)
|
||
|
|__________________________:__________________________________ (30)
|
||
|
| :
|
||
|
| :
|
||
|
25 + : .
|
||
|
| . : : :
|
||
|
|__:__._____:_____:________:____________________:__:__:_______ (20)
|
||
|
| : : : . : : : . : : :
|
||
|
| : : : : : : : : . : : : : : : :
|
||
|
15 + : : : : : : : : : : : . : . : : : : :
|
||
|
| : : : : : : : : : : : : : : : : : : : :
|
||
|
|--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:- (11)
|
||
|
|--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:- ( 9)
|
||
|
| : : : : : : : : : : : : : : : : : : : :
|
||
|
5 + : : : : : : : : : : : : : : : : : : : :
|
||
|
| : : : : : : : : : : : : : : : : : : : :
|
||
|
| : : : : : : : : : : : : : : : : : : : :
|
||
|
`--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
|
||
|
face: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
|
||
|
|
||
|
counts: 22 21 18 23 19 22 20 16 34 17 19 15 18 15 14 22 25 24 18 18
|
||
|
^^
|
||
|
|
||
|
One value (34) goes outside the 1% values (technically, you could say
|
||
|
goes outside a 99% acceptance region), so it seems our dice is biased.
|
||
|
More particularly, it rolls too many 9's. Whether this will affect a
|
||
|
game very much is another point. (If it had been "1" or "20", however,
|
||
|
perhaps this would result in a large effect on the game).
|
||
|
|
||
|
|
||
|
|
||
|
Testing an individual face:
|
||
|
|
||
|
When you suspect a particular face is coming up with the wrong
|
||
|
frequency, or only wish to test a particular face (e.g. 20 on a
|
||
|
d20), throw as before, but you can compare with a narrower range,
|
||
|
as below. However, you can't use the data that made you suspect
|
||
|
the face in this test, you must generate a new set of rolls.
|
||
|
E.g. If you a histogram and find the results for 7 look odd,
|
||
|
you can't use those numbers in this test. In that case the ranges
|
||
|
given above (for testing an entire histogram) are appropriate.
|
||
|
|
||
|
For the tables below, as for those above, you need to exceed the
|
||
|
range given to call the die 'probably biased'.
|
||
|
|
||
|
When you don't know the direction already (two-tailed test):
|
||
|
(If you are in doubt, use this table rather than the next one)
|
||
|
|
||
|
Table for 20
|
||
|
throws per face Approximate formula (reasonably accurate):
|
||
|
|
||
|
d# 5% 1% Let N be the total number of rolls.
|
||
|
4 13-28 11-30 Let c be the number of faces on the die.
|
||
|
6 12-28 10-31 Let e be the expected number of times
|
||
|
8 12-29 10-31 each face will come up ( e = N/c ).
|
||
|
10 12-29 10-32 Then the lines go at e +/- 1.96 x sqrt[e (c-1)/c],
|
||
|
12 12-29 10-32 (5%) and for 1% at e +/- 2.58 x sqrt[e (c-1)/c].
|
||
|
20 12-29 10-32 All fractions should be rounded up.
|
||
|
e.g. N=160, c=8, e=20 give:
|
||
|
5%: 20 +/- 1.96 sqrt(20 x 7/8) or 12-29 (rounded up)
|
||
|
1%: 20 +/- 2.58 sqrt(20 x 7/8) or 10-31 (rounded up)
|
||
|
|
||
|
|
||
|
|
||
|
When you think a particular face is coming up too often, or if you
|
||
|
think a particular face isn't coming up enough (one-tailed test):
|
||
|
|
||
|
Table for 20
|
||
|
throws per face Approximate formula (reasonably accurate):
|
||
|
|
||
|
d# 5% 1% Let N be the total number of rolls.
|
||
|
4 14/26 11/30 Let c be the number of faces on the die.
|
||
|
6 13/27 11/30 Let e be the expected number of times
|
||
|
8 13/27 11/30 each face will come up ( e = N/c ).
|
||
|
10 13/27 11/30 Then the lines go at e +/- 1.65 x sqrt[e (c-1)/c],
|
||
|
12 13/27 11/30 (5%) and for 1% at e +/- 2.33 x sqrt[e (c-1)/c].
|
||
|
20 13/27 11/31 All fractions should be rounded up.
|
||
|
e.g. N=160, c=8, e=20 give:
|
||
|
5%: 20 +/- 1.65 sqrt(20 x 7/8) or 14-27 (rounded up)
|
||
|
1%: 20 +/- 2.33 sqrt(20 x 7/8) or 11-30 (rounded up)
|
||
|
|
||
|
These values are given as either/or i.e. since you have already specified
|
||
|
a particular direction, you will compare with only the higher values or the
|
||
|
lower values, not both.
|
||
|
|
||
|
|
||
|
Example 1: You decide to test you new d20 to see if it rolls the
|
||
|
correct number of 20's, but you don't believe it to
|
||
|
be biased in a particular direction. You roll 400
|
||
|
times, and 35 times you get a "20". From the "two-tailed"
|
||
|
table above, you can see that's outside the outer (1%) range.
|
||
|
It seems your d20 rolls too many 20's.
|
||
|
|
||
|
Example 2: Another player seems to be rolling a lot of 1's on her d4.
|
||
|
You decide to test it whether 1 comes up too often.
|
||
|
You roll it 80 times, and get the following:
|
||
|
|
||
|
1 2 3 4
|
||
|
13 26 25 16
|
||
|
|
||
|
Since you decided to test if there were too many 1's, you
|
||
|
can only see if the number of 1's exceeds 26, which it does
|
||
|
not. You can't say, after generating the data "Oh, actually,
|
||
|
perhaps it rolls too few ones", or "perhaps it rolls too many
|
||
|
2's" without generating a new set of data for the new hypothesis.
|
||
|
You must never base what you are testing for on what you spy
|
||
|
in the set of data you use in the test. Our only conclusion
|
||
|
on this test: the d4 doesn't roll too many 1's.
|
||
|
|
||
|
You may like to then generate a new set of rolls to see if it
|
||
|
rolls too few 1's.
|
||
|
|
||
|
---------------------------------------------------------------------
|
||
|
As further examples, here are the chi-squared test and Kolmogorov-Smirnov (KS)
|
||
|
tests performed on the same data.
|
||
|
|
||
|
chi squared test:
|
||
|
|
||
|
counts: 22 21 18 23 19 22 20 16 34 17 19 15 18 15 14 22 25 24 18 18
|
||
|
diff
|
||
|
from 20: 2 1 2 3 1 2 0 4 14 3 1 5 2 5 6 2 5 6 2 2
|
||
|
diff^2: 4 1 4 9 1 4 0 16 196 9 1 25 4 25 36 4 25 36 4 4
|
||
|
sum diff^2: 408
|
||
|
chi-squared statistic: 408/20 = 20.4
|
||
|
(far less than the 5% value of 30.14)
|
||
|
|
||
|
|
||
|
Kolmogorov-Smirnov test:
|
||
|
|
||
|
(Calculations have been run down the page because I can't fit 20 3 digit
|
||
|
numbers, with spaces and labels across an 80-column screen).
|
||
|
|
||
|
counts sum expected diff
|
||
|
22 22 20 2
|
||
|
21 43 40 3
|
||
|
18 61 60 1
|
||
|
23 84 80 4
|
||
|
19 103 100 3
|
||
|
22 125 120 5
|
||
|
20 145 140 5
|
||
|
16 161 160 1
|
||
|
34 195 180 15 <-- max diff, D, is 15. Well short of significance
|
||
|
17 212 200 12 at the 5% level. e.g. calc D/sqrt(n) = 15/20
|
||
|
19 231 220 11 or .75; where the 5% value from the simulations
|
||
|
15 246 240 6 is 1.14
|
||
|
18 264 260 4
|
||
|
15 279 280 1
|
||
|
14 293 300 7
|
||
|
22 315 320 5
|
||
|
25 340 340 0
|
||
|
24 364 360 4
|
||
|
18 382 380 2
|
||
|
18 400 400 0
|
||
|
|
||
|
----------------------------------------------------------------------------
|
||
|
|
||
|
|
||
|
Conover, W.J. (1980): Practical nonparametric statistics,
|
||
|
2nd Ed., Wiley, New York.
|
||
|
|
||
|
Neave, H.R. and Worthington, P.L.B. (1988): Distribution-free tests,
|
||
|
Unwin Hyman, London.
|