Formally, the p value is the probability of finding a result as, or more, "discrepant" than the one observed, assuming that some hypothesis is true. Small values represent more discrepant observations. As an example, consider flipping a coin 10 times and suppose we observe a single heads-up flip out of 10. Under the hypothesis that the coin is fair, we would expect five heads-up flips. One head or no head at all are as (or more) discrepant as what we observed, and the same applies to nine heads and ten heads. The probability of obtaining either 0, 1, 9, or 10 heads-up flips is small, assuming the coin is fair: about 0.02.
Thus, if we observe one of those outcomes, it seems clear that this represents a discrepancy from what we would expect if the coin were fair, and therefore, in some sense, represents evidence relevant to judging whether the coin is fair or not. The question has always been:how should we use this information? There is a rich, interesting, and ongoing philosophical debate about this question that I do not have space to engage. I will simply say that in much of psychology, common practice is simply to look and see whether —as in the case of 0, 1, 9, or 10 heads out of 10 flips—the p value is less than .05 and if so, declare the null hypothesis false. This is bad science, as we’ll discuss in the context of the ASA’s six principles.
The American Statistical Association’s six principles regarding p values
1. P-values can indicate how incompatible the data are with a specified statistical model.
This principle arises from the typical definition of a p value and is relatively uncontroversial, with the possible exception of the phrase "how incompatible". The same p value does not necessarily indicate the same incompatibility, particularly for different sample sizes (e.g., a moderate discrepancy with a moderate sample size can yield the same p value as a very tiny discrepancy with a large sample size). But p values are useful for informally assessing the performance of a model and perhaps finding specific weaknesses in it.
2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were
produced by random chance alone. There's not much to say about this except that it is true. Although p values are often misunderstood as "the probability of the null hypothesis", this simply isn't correct. This misunderstanding can lead to weakly supported conclusions being published.
3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes
a specific threshold. One should, in general, beware of arbitrary thresholds. There are two possible issues here: first, the evidence is not substantially different on either side of the threshold. Declaring evidence for an effect when p=.04