SYSTEM_ERROR_505_STATS

By Beth Russell

If data is the gold standard, then why don’t all scientists agree all the time? We like to say the devil is in the details but it is really in the analysis and (mis)application of data. Scientific errors are rarely due to bad data; misinterpretation of data and misuse of statistical methods are much more likely culprits.

All data are essentially measurements. Imagine that you are trying to figure out where your property and your neighbors meet. You might have a rough idea of where the boundary is but you are going to have to take some measurements to be certain. Those measurements are data. Maybe you decide to step it off and calculate the distance based on the length of your shoe. Your neighbor decides to use a laser range finder. You are both going to be pretty close but you probably won’t end up in the exact same place. As long as his range finder is calibrated and your stride length is consistent, both methods are reliable and provide useful data. The only difference is the accuracy.

Are the data good or bad? It depends upon how accurate you need to be. Data are neither good or bad as long as the measurement tool is reliable. If you have a legal dispute your neighbor will probably win, on the other hand if you are just trying to figure out where to mow the grass you’re probably safe stepping it off. Neither data sets are bad, they just provide different levels of accuracy.

Accuracy is a major consideration in the next source of error, analysis. Just as it is important to consider your available ingredients and tools when you decide what to make for dinner, it is vital to consider the accuracy, type, and amount of data you have when you go to choosing a method for analysis. The primary analysis methods that science uses to determine if the available data supports a conclusion are statistical methods. These are tests that can estimate how likely it is that a given assumption is not true, they are not evidence that a conclusion is correct.

Unfortunately, statistical methods are not one size fits all. The validity of any method is dependent on properties of the data and the question being tested. Different statistical tests can lead to widely disparate conclusions. In order to provide the best available science, it is vital to choose, or design the best test for a given question and data set. Even then, two equally valid statistical tests can come to different conclusions, especially if there isn’t very much data or the data has high variability.

Here’s the rub… even scientists don’t always understand the analysis methods that they choose. Statistics is a science in itself and few biologists, chemists, or even physicists are expert statisticians. As the quantity and complexity of data grows, the importance of evaluating which analysis method(s) should be used becomes more and more important. Many times a method is chosen for historical reasons - “We’ve always used this method for this type of data because someone did that before.” Errors made due to choosing a poor method for the data are sloppy, lazy, bad science.

Better education in statistics will reduce this type of analysis-based errors and open science will make it easier to detect them. Another thing we can do is support more team science. If a team also includes a statistics expert, it is much less likely to make these type of errors. Finally, we need more statistics literate editors and reviewers. These positions exist to catch errors in the science and they need to consider the statistics part of the experiment, not the final arbiter of success or failure. High quality peer-review, collaboration, and the transparency created by open data are our best defenses against bad science. We need to strengthen them and put a greater emphasis on justifying analysis methodology choices in scientific discovery.

SYSTEM_ERROR_505_STATS_FAIL