Suppose you want to prove some point. But your little bit of data
doesn't quite prove it. Maybe more data would prove it. Watch out!

Collect more data, run the statistics again, did it work this time?

This is a slippery slope.

Suppose you are a hard worker, you're willing to run twenty tests, and maybe one or two or three of them shows a significant result with P<0.05. Could that happen with no real difference in your data? Of course it could. On average more than 'could', it *will*! This is the issue of the non-replicatable results, and the unpublished non-results. There was a recent study saying most published results don't replicate when someone else tries it again later. (One wonders if that study was replicatable.) So you have a signficant result maybe but run it a different way, by someone else, who maybe isn't going to get so famous for the 'discovery', and it might well end up different. How can we minimize that?

Some good practices will help people find truth instead of temporary glory that becomes fake news.

- One practice is to show all your data. Putting your files on the internet is not going to cost you anything if your department already has a server. This means at least you should write down your counts when you collect them.
- Another is to do your own replication. Collect the data you need twice, once to seem to prove your point statistically and a second time to check your reliability.
- Another is to graph the P values, and the two frequencies in the two columns as your data count increases. Keep collecting data until it looks pretty stable. Don't just stop at the moment it crosses the line of P<0.05.
- Another is to decide in advance how much data you'll need to
prove your point, then collect just that much once. Then others might
thing at least that time was an honest experiment. But you might need
to make the previous graph shown above in order to make that
judgement. So if you're just starting to explore the question, expect
to estimate the relative frequencies in the first run, then pick a
target data count and a P level, then do it again to that count.
LI> Another is to think about one-tailed and two-tailed hypothesis
testing. Two tailed says, Is there a difference at all? One tailed
says, Is the difference the one I'm expecting to see. Once you
collect enough data for that, write your data down and don't use Tea
Lady which is for collecting data as you go, but instead use Oyvind
Langsrud's calculator where you just put in the totals.
https://www.langsrud.com/stat/fisher.htm

Langsrud also has two-tailed and one-tailed results.