P Hacking

a scientific sin


Suppose you want to prove some point. But your little bit of data doesn't quite prove it. Maybe more data would prove it. Watch out!

Collect more data, run the statistics again, did it work this time?

This is a slippery slope.

Suppose you are a hard worker, you're willing to run twenty tests, and maybe one or two or three of them shows a significant result with P<0.05. Could that happen with no real difference in your data? Of course it could. On average more than 'could', it *will*! This is the issue of the non-replicatable results, and the unpublished non-results. There was a recent study saying most published results don't replicate when someone else tries it again later. (One wonders if that study was replicatable.) So you have a signficant result maybe but run it a different way, by someone else, who maybe isn't going to get so famous for the 'discovery', and it might well end up different. How can we minimize that?

Some good practices will help people find truth instead of temporary glory that becomes fake news.

Yes, there's more! Scientists do their job not just by having ideas and doing experiments, but also by finding things wrong with other people's experiments. A good scientist is always trying to find things wrong with their own experiments. It's not quite as bad as my grizzled old Penn phonetics teacher Leigh Lisker said to me, The less you say, the less likely you are to be wrong. Well, he was right, sure, but still I say, just try to say things that are as true as you can make them be, and be open to the process. Be a good scientist.

Copyright © 2000-2020, Thomas C. Veatch. All rights reserved.
Modified: April 18, 2020