A conflict(?) between Frequentists and Bayesians: The Jeffreys-Lindley Paradox
The Jeffreys-Lindley paradox is an apparently puzzling problem in Statistical Inference . It has been seen often that frequentist and Bayesian approaches to the testing of a point null hypothesis i.e. a simple hypothesis , lead to divergent results especially when the sample size is large and for different choices of the prior distribution of the parameter under study.
Statement of the paradox:
The paradox can be understood in the general setting as follows:- Let x denote the observation or the data obtained from the experiment under study.
- A test of significance for the null hypothesis
gets rejected at level of significance
.
- The posterior probability of
, given the data x , is very high even for small prior probability of
.
Suppose we compare different sets of observations with varying sample sizes 'n' , all of which produce equally significant p-values (say, 0.01) when the frequentist test of significance is performed. Then, as the sample size n increases, the Bayesian approach would reveal that the data increasingly supports the null hypothesis. Thus the Bayesian approach accepts a null hypothesis which the frequentist approach rejects.
Lindley discussed this paradox in the context of Gaussian models .The formal statement of the paradox may be stated as:
In a Gaussian Model
,
assume
and any regular
proper prior distribution on
. Then, for any testing level
, we can find a sample size
and independent, identically distributed data
such that
proper prior distribution on
- The sample mean
is significantly different from
at level
-
is at least as big as
.
Mathematical justification:
We proceed to show the apparent discrepancy between the two approaches in the case of the Gaussian model as stated originally by Lindley.
Let
be a random sample from a normal distribution of mean
and known variance
. Let the prior probability that
, the value on the null hypothesis, be c. Suppose that the remainder of the prior probability is distributed uniformly over some interval
containing
. We shall deal with situations where
, the arithmetic mean of the observations, and a minimal sufficient statistic, is well within the interval
.
Let
The posterior probability that
, in the light of the sample drawn, is given by
By virtue of the assumption about
Now suppose that the value of
is such that, on performing the usual significance test
for the mean
of a normal distribution with known variance, the result is significant at
the
percentage point. That is,
, where
is a number dependent on 
only and can be found from tables of the normal distribution function. Putting this value
for
we have the following value for the posterior probability that 
(Note that
tends to zero as n increases so that
will lie well within the interval
for
sufficiently large n.)
We observe that as
,
.i.e. the Bayesian approach will be increasingly inclined to accept the null hypothesis as the sample size n increases while the p-value remains constant, leading to the paradox.
We observe that as
Reasons behind the paradox:
We briefly discuss the reason behind this apparent paradox
- For consistent tests used in the frequentist approach, the power of the test converges to 1 as the sample size inreases. This means that even small deviations from the null hypothesis is regarded as significant resulting in a small p-value, and this is no paradoxical result since any good test should be consistent.
- The frequentist and Bayesian approaches answer two fundamentally different questions, and the results obtained from the two approaches are misinterpreted. A small p-value obtained from the frequentist test indicates the deviation from the null hypothesis is signficant, but it does take into account the alternative hypothesis so as to conclude the alternative hypothesis is more plausible in the light of the given sample. A small p-value simply indicates that the data does not support the null hypothesis. The Bayesian approach on the other hand compares the posterior odds of the competing null and alternative hypothesis. It is to be understood that the null value to be tested is fundamentally different from the other values in the parameter space. We perform those tests only for which the null value of the parameter holds particular interest for us. Now if
is concentrated on a single point value and
is very diffuse, such that the null value of the parameter is a better fit to the data than most, but not necessarily all the values in the parameter space then the Bayesian approach concludes that the null is a better fit to the data than the alternative.
References:
- Jeffreys, Harold (1939). Theory of Probability. Oxford University Press. MR 0000924.
- Lindley, D.V. (1957). "A statistical paradox". Biometrika. 44 (1–2): 187–192. doi:10.1093/biomet/44.1-2.187. JSTOR 2333251.
- Spanos, Aris (2013). "Who should be afraid of the Jeffreys-Lindley paradox?". Philosophy of Science (journal). 80.1: 73–93. doi:10.1086/668875.
Comments
Post a Comment