tomveatch.com / Tea Lady / phacking.php

Suppose you want to prove some point. But your little bit of data doesn't quite prove it. Maybe more data would prove it. Watch out!

Collect more data, run the statistics again, did it work this time?

This is a slippery slope.

Suppose you are a hard worker, you're willing to run twenty tests, and maybe one or two or three of them shows a significant result with P<0.05. Could that happen with no real difference in your data? Of course it could. On average more than 'could', it *will*! This is the issue of the non-replicatable results, and the unpublished non-results. There was a recent study saying most published results don't replicate when someone else tries it again later. (One wonders if that study was replicatable.) So you have a signficant result maybe but run it a different way, by someone else, who maybe isn't going to get so famous for the 'discovery', and it might well end up different. How can we minimize that?

Some good practices will help people find truth instead of temporary glory that becomes fake news.

One practice is to show all your data. Putting your files on the internet is not going to cost you anything if your department already has a server. This means at least you should write down your counts when you collect them.
Another is to do your own replication. Collect the data you need twice, once to seem to prove your point statistically and a second time to check your reliability.
Another is to graph the P values, and the two frequencies in the two columns as your data count increases. Keep collecting data until it looks pretty stable. Don't just stop at the moment it crosses the line of P<0.05.
Another is to decide in advance how much data you'll need to prove your point, then collect just that much once. Then others might thing at least that time was an honest experiment. But you might need to make the previous graph shown above in order to make that judgement. So if you're just starting to explore the question, expect to estimate the relative frequencies in the first run, then pick a target data count and a P level, then do it again to that count. LI> Another is to think about one-tailed and two-tailed hypothesis testing. Two tailed says, Is there a difference at all? One tailed says, Is the difference the one I'm expecting to see. Once you collect enough data for that, write your data down and don't use Tea Lady which is for collecting data as you go, but instead use Oyvind Langsrud's calculator where you just put in the totals.
https://www.langsrud.com/stat/fisher.htm
Langsrud also has two-tailed and one-tailed results.

Yes, there's more! Scientists do their job not just by having ideas and doing experiments, but also by finding things wrong with other people's experiments. A good scientist is always trying to find things wrong with their own experiments. It's not quite as bad as my grizzled old Penn phonetics teacher Leigh Lisker said to me, The less you say, the less likely you are to be wrong. Well, he was right, sure, but still I say, just try to say things that are as true as you can make them be, and be open to the process. Be a good scientist.

P Hacking

Copyright © 2000-2020, Thomas C. Veatch. All rights reserved.
Modified: April 18, 2020

P Hacking

Copyright © 2000-2020, Thomas C. Veatch. All rights reserved. Modified: April 18, 2020

Copyright © 2000-2020, Thomas C. Veatch. All rights reserved.
Modified: April 18, 2020