WBM STATS: Jeffreys-Lindley-Bartlett's Paradox: Unravelling a Foundational Split in Statistics

by W. B. Meitei, PhD

The Jeffreys-Lindley-Bartlett paradox stands as one of the most startling and debated phenomena in the theory of statistical inference, challenging assumptions about how evidence is assessed in the Bayesian versus frequentist schools. This paradox reveals scenarios where the conclusion from a frequentist hypothesis test ("reject the null hypothesis") directly conflicts with the output of Bayesian inference, which may decisively favour the null hypothesis, even when analysing the same data set.

Background and origins

Harold Jeffreys introduced the mathematical and philosophical basis of what became the paradox in the 1930s and 1940s, showing that evidence required to reject a null hypothesis should actually increase with sample size, clashing with common frequentist practice.

Dennis Lindley famously formalised and publicised the problem as a paradox in a 1957 paper, giving it prominence and clarity in statistical discussions. Read the full paper here.

M. S. Bartlett extended and clarified the issue, demonstrating that choosing a "non-informative" or very diffuse prior in the Bayesian setup could, counterintuitively, make the evidence overwhelmingly favour the null hypothesis, regardless of how compelling the data against it appeared. Read the full paper here.

Explaining the paradox

Suppose we are testing a point null hypothesis (e.g., "the mean is equal to zero") using a large sample.

Frequentist inference: Keeping the significance level fixed (e.g., 5%), we will reject the null hypothesis for any deviation from zero if we collect enough data, since even minuscule differences become "statistically significant" as the sample grows.

Bayesian inference: When using a diffuse prior for the alternative, the Bayes factor, the core Bayesian evidence metric, may incline towards supporting the null hypothesis, despite the frequentist's significant finding. This occurs because the marginal likelihood for the alternative hypothesis (which averages the likelihood over the entire, wide-ranging prior) gets "diluted," while all prior mass for the null is concentrated at the hypothesis value.

Mathematical roots

The crux of the paradox is how the Bayes factor is sensitive to:

Sample size (n): As n → ∞, the likelihood under the null becomes sharply peaked. The Bayes factor will favour the null unless the prior for the alternative is highly concentrated around values supported by the data, which seldom holds for broadly specified or "non-informative" priors.
Variance of prior: If the prior for the alternative hypothesis is made increasingly broad (variance → ∞), Bayesian evidence in favour of the null approaches certainty regardless of the observed data, a phenomenon specifically described by Bartlett.

Imagine tossing a coin a million times. We observe 500,600 heads, a small, but "significant" deviation from perfect fairness:

Frequentist result: p-value is tiny; the null ("fair coin") is rejected.

Bayesian result: If using a very flat (uninformative) prior for coin bias, the Bayes factor turns out to overwhelmingly support the null; the prior for "biased coins" is spread so thin that observed data isn't enough to outweigh the null.

Philosophical and practical implications

Prior selection is crucial: The paradox demonstrates that using broad or improper priors in Bayesian hypothesis testing can yield practically absurd model selection, motivating the adoption of weakly-informative or problem-specific priors.
Rethinking evidence and significance: The conflict highlights foundational issues in how statistical evidence is conceptualised, warning against mechanical use of p-values or "default" priors.
Frequentist significance thresholds: The paradox raises concerns about the routine practice of using fixed significance levels regardless of sample size, suggesting a need for adjusting thresholds as data accumulates.

This paradox, far from being a mere technical curiosity, continues to influence both statistical theory and the broader philosophy of evidence, serving as a catalyst for methodological innovation and critical thinking in scientific research.

Recent proposed solutions to the Jeffreys-Lindley-Bartlett paradox

Recent proposed solutions to the Jeffreys-Lindley-Bartlett paradox focus on both the choice of priors in Bayesian testing and on re-examining foundational aspects of hypothesis testing frameworks. Here are key modern approaches:

Weakly informative or sample-dependent priors: Instead of using non-informative or overly diffuse priors (which cause the paradox), researchers now recommend "weakly informative" priors that grow more informative as sample size increases, or that adapt to the data context. Sample size-dependent prior strategies have been explicitly developed to bridge the Bayesian and frequentist divide, reducing the conflict as datasets become large.
Cake priors: A notable recent innovation, "cake priors," are a new class of priors designed to circumvent the paradox. These allow for diffuse priors while ensuring statistical inferences remain sensible. Cake priors lead to Bayesian tests that are Chernoff-consistent (zero type I and II errors asymptotically) and can often be interpreted as penalised likelihood ratio tests. Their design ensures that the paradox occurs only with vanishing probability in large samples.
Changing significance thresholds with sample size: In the frequentist sphere, it has been proposed that significance levels (α) should decrease as sample size grows, rather than remain fixed, as maintaining a constant α is a root cause of the paradox. This approach, sometimes called "almost sure hypothesis testing," results in a sequence of hypothesis tests that make only a finite number of errors as the sample size grows, thereby resolving the paradox in practice.
Re-evaluating the point-null hypothesis: Some theorists argue that the paradox is less about statistical machinery and more about the unrealistic nature of testing perfect point hypotheses. Instead, they suggest framing nulls as intervals or using shrinkage priors that concentrate prior mass more sensibly, addressing both Type I error and Bayesian evidence alignment.
Objective and loss-based prior selection: Methods based on Kullback-Leibler divergence or self-information loss try to assign prior weights objectively, even allowing the prior for the alternative hypothesis to be set in a way closely linked to classical significance levels, avoiding pathologies as variances go to infinity.
Handling improper priors: Recent analysis shows that Bartlett’s paradox does not necessarily apply to all improper priors. Some classes of improper priors (like Stein's shrinkage prior) lead to well-defined Bayes factors, provided their diffusion and measure rate are properly controlled. However, pathologies can still arise, so caution is emphasised, and regularisation rules are suggested.
Hybrid and robust Bayesian methods: Ongoing research also investigates hybrid techniques, combining Bayesian insights with frequentist calibration or robust Bayesian model averaging, to mitigate the effects of prior mis-specification and large-sample divergences.

In summary, the main thrust of recent solutions is smarter, context-aware prior specification (including cake and weakly informative priors), dynamic thresholding of significance in frequentist testing, and new theoretical frameworks that blend benefits from both frequentist and Bayesian paradigms, making the paradox less of a practical challenge and more of a caution against naive use of statistical defaults.

The paradox impact on the practical interpretation of p-values and Bayes factors

The Jeffreys-Lindley-Bartlett paradox profoundly impacts how p-values and Bayes factors should be interpreted in practical data analysis, especially as sample sizes grow large:

a. Practical interpretation of p-values

Overstated evidence: The paradox demonstrates that small p-values may not always imply strong evidence against the null hypothesis, particularly in large datasets. As the sample size increases, even trivial differences from the null can yield extremely small p-values, potentially leading to exaggerated claims of discovery when the true effect is negligible.
Context sensitivity: Researchers are increasingly cautioned that p-values must be interpreted in light of context, effect size, sample size, and prior belief, not just as a mechanical threshold for significance.

b. Practical interpretation of Bayes factors

Prior dependence: The paradox exposes how the Bayes factor can favour the null hypothesis, even when observed data strongly deviate from the null if a diffuse or poorly chosen prior is used for the alternative. Thus, Bayes factors must be interpreted with careful attention to the choice and justification of priors.
Calibration required: Interpretation of Bayes factors is not universal; they must be calibrated to the problem at hand, with sensitivity analyses to different priors reducing the risk of misleading conclusions.

In essence, the paradox serves as a cautionary tale: neither p-values nor the Bayes factors alone provide absolute evidence; their interpretation is nuanced, context-dependent, and should be embedded in a broader inferential framework.

Suggested Readings:

Jeffreys, H. (1948). The theory of probability. The Clarendon Press, Oxford.
Bartlett, M. S. (1957). A comment on DV Lindley's statistical paradox. Biometrika, 44(3-4), 533-534.
Lindley, D. V. (1957). A statistical paradox. Biometrika, 44(1/2), 187-192.
Wagenmakers, E. J., & Ly, A. (2023). History and nature of the Jeffreys–Lindley paradox. Archive for History of Exact Sciences, 77(1), 25-72.
Press, S. J. (2003). Bayesian Hypothesis Testing.Subjective and Objective Bayesian Statistics: Principles, Models, andApplications. John Wiley and Sons, Inc.
Lavine, M., & Schervish, M. J. (1999). Bayes factors: What they are andwhat they are not. The American Statistician, 53(2), 119-122.
Bartlett's paradox in Bayesian evidence. MATHEMATICS.

Suggested Citation: Meitei, W. B. (2025). Jeffreys-Lindley-Bartlett's Paradox: Unravelling a Foundational Split in Statistics. WBM STATS.

Pages

Sunday, 10 August 2025

Jeffreys-Lindley-Bartlett's Paradox: Unravelling a Foundational Split in Statistics

No comments:

Post a Comment

Disclaimer