Suppose you want to prove some point. But your little bit of data
doesn't quite prove it. Maybe more data would prove it. Watch out!
Collect more data, run the statistics again, did it work this time?
This is a slippery slope.
Suppose you are a hard worker, you're willing to run twenty tests, and
maybe one or two or three of them shows a significant result with
P<0.05. Could that happen with no real difference in your data? Of
course it could. On average more than 'could', it *will*! This is
the issue of the non-replicatable results, and the unpublished
non-results. There was a recent study saying most published results
don't replicate when someone else tries it again later. (One wonders
if that study was replicatable.) So you have a signficant result
maybe but run it a different way, by someone else, who maybe isn't
going to get so famous for the 'discovery', and it might well end up
different. How can we minimize that?
Some good practices will help people find truth instead of temporary
glory that becomes fake news.
- One practice is to show all your data. Putting your files on
the internet is not going to cost you anything if your department
already has a server. This means at least you should write down your
counts when you collect them.
- Another is to do your own replication. Collect the data you
need twice, once to seem to prove your point statistically and a
second time to check your reliability.
- Another is to graph the P values, and the two frequencies in
the two columns as your data count increases. Keep collecting data
until it looks pretty stable. Don't just stop at the moment it
crosses the line of P<0.05.
- Another is to decide in advance how much data you'll need to
prove your point, then collect just that much once. Then others might
thing at least that time was an honest experiment. But you might need
to make the previous graph shown above in order to make that
judgement. So if you're just starting to explore the question, expect
to estimate the relative frequencies in the first run, then pick a
target data count and a P level, then do it again to that count.
LI> Another is to think about one-tailed and two-tailed hypothesis
testing. Two tailed says, Is there a difference at all? One tailed
says, Is the difference the one I'm expecting to see. Once you
collect enough data for that, write your data down and don't use Tea
Lady which is for collecting data as you go, but instead use Oyvind
Langsrud's calculator where you just put in the totals.
https://www.langsrud.com/stat/fisher.htm
Langsrud also has two-tailed and one-tailed results.
Yes, there's more! Scientists do their job not just by having ideas
and doing experiments, but also by finding things wrong with other
people's experiments. A good scientist is always trying to find
things wrong with their own experiments. It's not quite as bad as my
grizzled old Penn phonetics teacher Leigh Lisker said to me, The less
you say, the less likely you are to be wrong. Well, he was right,
sure, but still I say, just try to say things that are as true as you
can make them be, and be open to the process. Be a good scientist.