The American Statistical Association statement on pvalues
Monday, March 7, 2016
(0 Comments)
Richard Morey
There are no statistics that inflame the passions of statisticians and scientists as does the p value. The p value is, informally, a statistic used for assessing whether a "null hypothesis" (e.g., that the difference in performance between two conditions is 0) should be taken seriously. It is simultaneously the most used and most hated statistic in all of science. Use of p values has been called bad science, associated with cultlike andritualistic behaviour, and pegged as a major cause of the socalled "crisis" that many of the sciences find themselves in.
In response to the controversy over p values, the American Statistical Association has today taken the unprecedented step of releasing a statement regarding a consensus viewpoint on the use of p values. This statement represents the input of the world's top experts on the topic. The whole statement is worth reading, but I'm going to focus on their six "principles" regarding values, first defining the p value, then describing the principles and adding my own brief commentary.
Formally, the p value is the probability of finding a result as, or more, "discrepant" than the one observed, assuming that some hypothesis is true. Small values represent more discrepant observations. As an example, consider flipping a coin 10 times and suppose we observe a single headsup flip out of 10. Under the hypothesis that the coin is fair, we would expect five headsup flips. One head or no head at all are as (or more) discrepant as what we observed, and the same applies to nine heads and ten heads. The probability of obtaining either 0, 1, 9, or 10 headsup flips is small, assuming the coin is fair: about 0.02.
Thus, if we observe one of those outcomes, it seems clear that this represents a discrepancy from what we would expect if the coin were fair, and therefore, in some sense, represents evidence relevant to judging whether the coin is fair or not. The question has always been:how should we use this information? There is a rich, interesting, and ongoing philosophical debate about this question that I do not have space to engage. I will simply say that in much of psychology, common practice is simply to look and see whether —as in the case of 0, 1, 9, or 10 heads out of 10 flips—the p value is less than .05 and if so, declare the null hypothesis false. This is bad science, as we’ll discuss in the context of the ASA’s six principles.
The American Statistical Association’s six principles regarding p values
1. Pvalues can indicate how incompatible the data are with a specified statistical model.
This principle arises from the typical definition of a p value and is relatively uncontroversial, with the possible exception of the phrase "how incompatible". The same p value does not necessarily indicate the same incompatibility, particularly for different sample sizes (e.g., a moderate discrepancy with a moderate sample size can yield the same p value as a very tiny discrepancy with a large sample size). But p values are useful for informally assessing the performance of a model and perhaps finding specific weaknesses in it.
2. Pvalues do not measure the probability that the studied hypothesis is true, or the probability that the data were
produced by random chance alone.
There's not much to say about this except that it is true. Although p values are often misunderstood as "the probability of the null hypothesis", this simply isn't correct. This misunderstanding can lead to weakly supported conclusions being published.
3. Scientific conclusions and business or policy decisions should not be based only on whether a pvalue passes
a specific threshold.
One should, in general, beware of arbitrary thresholds. There are two possible issues here: first, the evidence is not substantially different on either side of the threshold. Declaring evidence for an effect when p=.04
