WBM STATS: Difference Between p-value and Bayes Factor

by W. B. Meitei, PhD

Before exploring the differences between p-values and Bayes factors, it’s important to understand the fundamental distinction between the Frequentist and Bayesian approaches to statistics.

Frequentist vs Bayesian

Statistics is commonly framed within two main perspectives: the Frequentist approach and the Bayesian approach. While Bayesian methods often provide a more intuitive and flexible framework for inference and hypothesis comparison, both approaches have unique philosophies and methodologies. The key difference lies in how they interpret and use probability.

The Frequentist approach relies on classical tools such as p-values, significance levels, statistical power, and confidence intervals. Here, probability is used only to model certain specific processes described by the sampling procedure (or described by the sample). Thus, allowing the data to carry some amount of uncertainty. While adopting a Frequentist approach, there is always a worry in mind that the "correct model" is specified, or a null model is not supported by the data.

In contrast, the Bayesian approach treats probability in a more diverse way to model the sampling processes as well as the other related uncertainties. Here, we focus on whether or not the parameters and model are sensible for that particular dataset by providing credible intervals instead of confidence intervals. With this approach, we don't need to worry about setting up a null hypothesis; instead, we are given the power to make a direct probability statement about the parameter of interest. In the Bayesian approach, we make direct probability statements about the parameters using the observed sample. In contrast, the p-value is calculated on the assumption of drawing a hypothetical infinite number of samples (i.e., sampling distribution) that we never really observe. Finally, the Bayesian approach is also known to deal better with small samples. The Bayesian approach also incorporates prior information (about previous findings and theory) into the estimation, which sometimes can prove to be highly useful. Credible intervals (Bayesian analogues to confidence intervals) express interval estimates grounded on the posterior distribution. The Bayesian framework also naturally incorporates prior information, such as previous research or expert knowledge, adding valuable context to the analysis.

Moreover, Bayesian methods often perform better with smaller sample sizes, offering robust inference where Frequentist methods might struggle. Overall, the Bayesian approach provides a more comprehensive and probabilistic understanding of both data and uncertainty.

p-value Vs Bayes factor

p-value:

Fisher originally introduced the p-value within a carefully designed agricultural experiment. While it serves as an intuitively useful measure against the null hypothesis, it is often misunderstood or misused. Some common errors we make when interpreting p-values include:

Interpreting it as the probability that H₀ is (not) true, while it measures only the extremeness of the observed result under H₀.
It doesn't express the probability that the observed result occurred under H₀, but is rather the probability of observing a more extreme result under H₀. This implies that it is based not only on the observed result but also on fictive (never observed) data. For example, if we want to test the significance of the regression estimates (beta coefficients), we look for the p-value from the t-distribution, which is hypothetical.
It is not an absolute measure. A small value does not necessarily mean there is a significant difference between two or more characteristics of interest (variables).
It does not take into account the size of the study. (Royall, 1997)

It would be rather more informative to use a 95% confidence interval in place of the p-value, as it is considered to provide more insights relevant to the obtained result.

Bayes factor:

The Bayes factor is a key concept in Bayesian statistics, used to quantify how strongly data support one statistical hypothesis over another. Bayes factor is the outcome of one of the major contributions of Jeffreys in the early 20th century. The Bayes factor compares the probabilities of the observed data under two competing models or hypotheses, typically a null and an alternative (i.e., it measures the change from prior to posterior odds favouring the null hypothesis). It is the Bayesian equivalent of the likelihood ratio test. By providing a direct, interpretable measure of evidence from the data, the Bayes factor offers an alternative to traditional p-values, allowing researchers to update their beliefs in light of new information and assess which hypothesis is better supported by the evidence.

If y represents the observed data and H₀ represents the null hypothesis to be tested. Then, according to Bayes' theorem,

\[P\left( H_{0}|y \right) = \frac{P\left( y \middle| H_{0} \right)P(H_{0})}{P\left( y \middle| H_{0} \right)P\left( H_{0} \right) + P\left( y \middle| H_{1} \right)P(H_{1})}\]

Here, H₁ is the alternative hypothesis.

Similarly,

\[P\left( H_{1}|y \right) = \frac{P\left( y \middle| H_{1} \right)P(H_{1})}{P\left( y \middle| H_{0} \right)P\left( H_{0} \right) + P\left( y \middle| H_{1} \right)P(H_{1})}\]

Thus,

\[\frac{P\left( H_{0}|y \right)}{P\left( H_{1}|y \right)} = \frac{P\left( y \middle| H_{0} \right)}{P\left( y \middle| H_{1} \right)} \times \frac{P(H_{0})}{P(H_{1})}\]

The term,

\[\frac{P\left( y \middle| H_{0} \right)}{P\left( y \middle| H_{1} \right)}\]

is known as the Bayes factor. Its value ranged from 0 to infinity.

Thus,

posterior odds = Bayes factor × prior odds

The values of the Bayes factor larger than 1 are interpreted as evidence in favour of H₀ (relative to H₁). The larger the values, the stronger the evidence. On the contrary, values less than 1 favour H₁.

According to Jeffreys, the classification of the Bayes factor favouring H₀against H₁ is given below,

"decisive" if Bayes factor > 100
"very strong" if 32 < Bayes factor ≤ 100
"strong" if 10 < Bayes factor ≤ 32
"substantial" if 3.2 < Bayes factor ≤ 10
"not worth" if 1 < Bayes factor ≤ 3.2

A more precise Jeffreys' scale of Bayes factor favouring H₀against H₁, and H₁against H₀ provided in Jeffreys' book "The Theory of Probability" is given below.

Bayes factor (BF) favouring H₀against H₁			*Bayes factor* (BF) favouring H₁against H₀
BF	log₁₀(BF)	Strength of evidence	BF	log₁₀(BF)	Strength of evidence
1 to 10^1/2	0 to ½	Not worth	1 to 10^-1/2	0 to -1/2	Not worth
10^1/2 to 10^2/2	½ to 2/2	Substantial	10^-1/2 to 10^-2/2	-1/2 to -1	Substantial
10^2/2 to 10^3/2	2/2 to 3/2	Strong	10^-2/2 to 10^-3/2	-1 to -3/2	Strong
10^3/2 to 10^4/2	3/2 to 4/2	Very strong	10^-3/2 to 10^-4/2	-3/2 to -2	Very strong
> 10^4/2	> 2	Decisive	< 10^-4/2	< -2	Decisive

A note of caution while using the Bayes factor is that the interpretation of the Bayes factor is not universal. A certain calibration is required with sensitivity analysis using different priors, which will help reduce the risk of misleading conclusions. The table below summarises how various researchers, including Jeffreys, have classified the strength of evidence against the null hypothesis (H₀) when the Bayes factor is less than or equal to 1.

Bayes factor	Strength of evidence against H₀
Bayes factor	Jeffreys' scale	Goodman scale	Held & Ott scale
1 to 1/3	Bare mention		Weak
1/3 to 1/10	Substantial	Weak to moderate	Moderate
1/10 to 1/30	Strong	Moderate to strong	Substantial
1/30 to 1/100	Very strong	Strong	Strong
1/100 to 1/300	Decisive	Very strong	Very strong
<1/300			Decisive
Jeffreys actually used the slightly different cut points 1/10^a/2, a = 1, 2, 3, 4, whereas Goodman specified evidence categories “weak,” “moderate,” “moderate to strong,” and “strong to very strong” for Bayes factors of 1/5, 1/10, 1/20, and 1/100, respectively, which we have modified and aligned with our cut points.
*Table Source*: Held & Ott (2018)

Note: This blog highlights the key conceptual difference between the p-value and the Bayes factor. For more information, please go through the material provided in the reading list.

Suggested Reading:

Assaf, A. G., & Tsionas, M. (2018). Bayes factors vs p-values. Tourism Management, 67, 17-31.
Held, L., & Ott, M. (2018). On p-values and Bayes factors. Annual Review of Statistics and Its Application, 5(1), 393-419..
Lesaffre, E., & Lawson, A. B. (2012). Bayesian Biostatistics. John Wiley & Sons.
Royall, R. (1997). Statistical evidence: a likelihood paradigm. Routledge.
Taboga, M. (2021). Jeffreys' scale. Fundamentals of Statistics.
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773-795.

Suggested Citation: Meitei, W. B. (2020). Difference between p-value and Bayes factor. WBM STATS.

Pages

Friday, 14 February 2020

Difference Between p-value and Bayes Factor

No comments:

Post a Comment

Disclaimer