The German Tank Problem: Frequentist vs. Bayesian Approach

The Historical Problem:

During World War 2, the Western Allies wanted to estimate the rate at which German tanks were being produced from a paucity (statistically speaking) of sampled data. In World War II, each manufactured German tank or piece of weaponry was printed with a serial number. Using serial numbers from damaged or captured German tanks, the Allies were able to calculate the total number of tanks and other machinery in the German arsenal.

Allied mathematicians were only able to collect a limited sample of German tanks, but used that sample as an estimator of the population maximum of German tanks. Statistical analysis proved far more accurate than estimates based on conventional intelligence gathering, which tended to wildly overestimate the number of tanks produced each month.

The problem can be approached using either frequentist inference or Bayesian inference, leading to different results. Estimating the population maximum based on a single sample yields divergent results, whereas estimation based on multiple samples is a practical estimation question whose answer is simple (especially in the frequentist setting) but not obvious (especially in the Bayesian setting). Here we consider the single sample problem only.

The Frequentist and the Bayesian Approaches:

In the classical or frequentist approach (parametric) we have a parameter or population characteristic of interest, which we regard as a fixed but unknown constant. Our objective is to estimate the parameter and infer about the value or the range of the values that the parameter can possibly take, based on a random sample drawn from the population of interest. Here we consider a simple random sample of size n drawn without replacement from the population of the tank serial numbers.

Alternatively, in the Bayesian framework, the unknown parameter is considered to be stochastic and is assigned a prior distribution quantifying our degree of belief or quantifying prior information regarding the parameter. Based on the sample data we modify or update our prior belief and obtain a posterior distribution, which is the distribution of the parameter conditional on the sample data.

Posterior information = prior information + information from sample data.

The Bayes Theorem: $\boldsymbol{\mathit{p(\alpha_{1},\alpha_{2},...,\alpha_{n}|x_{1},x_{2},...,x_{m})=\frac{p(x_{1},x_{2},...,x_{m}|\alpha _{1},\alpha _{2},...,\alpha _{n})p(\alpha_{1},\alpha_{2},...,\alpha_{n})}{p(x_{1},x_{2},...,x_{m})}}}$

where $\boldsymbol{\mathit{\alpha_{1},\alpha_{2},...,\alpha_{n}}}$ are the parameters and $\boldsymbol{\mathit{x_{1},x_{2},...,x_{m}}}$ is the random sample.

We summarize the posterior information by using statistics such as the posterior mean or the posterior median and provide measures of accuracy using the posterior sd or the quartile deviation respectively. We perform interval estimation using Highest Posterior Density Intervals.

We shall further calculate the error of the estimates for these two approaches and try to compare them.

The Frequentist Approach:

Assume X denotes the serial no. on a randomly selected destroyed or captured tank. It is assumed that

$\boldsymbol{\mathit{P(X=x)=\frac{1}{N} ,x=1,2,...,N}}$
i.e. X follows a discrete uniform distribution. Let $\mathbf{\mathit{X_{1},X_{2},...,X_{n}}}$ be a random sample of size n of serial numbers from the population of tanks. The sample maximum $\mathbf{\mathit{X_{(n)}}}$ is a complete sufficient statistic for the parameter N.

Point Estimation:
The UMVUE for N is given by $\mathbf{\mathit{T_{n}=X_{n}(1+\frac{1}{n})-1}}$

Now, $\mathbf{\mathit{V(T_{n})=\frac{(N-n)(N+1)}{n(n+2))}\approx \frac{N^{2}}{n^{2}}}}$
Estimate of $\mathbf{\mathit{V(T_{n})=\hat{V(T_{n})}=\frac{X_{(n)}(X_{(n)}-n)}{n^{2}}}}$

Testing of Hypothesis:

We want to test $\mathbf{\mathit{H_{0}:N=N_{0}}}$ against $\mathbf{\mathit{H_{1}:N>N_{0}}}$ . Then an appropriate test statistic for testing

$\mathbf{\mathit{H_{0}}}$ against $\mathbf{\mathit{H_{1}}}$ is given by $\mathbf{\mathit{X_{(n)}}}$ .

We reject $\mathbf{\mathit{H_{0}}}$ against $\mathbf{\mathit{H_{1}}}$ at level $\mathbf{\mathit{\alpha }}$ iff $\mathbf{\mathit{X_{(n)}>c}}$ where c is smallest integer such that $\mathbf{\mathit{P_{H_{0}}(X_{n}>c)\leq \alpha }}$

$\mathbf{\mathit{\Rightarrow P_{H_{0}}(X_{n}\leq c)\geq1- \alpha}}$

$\mathbf{\mathit{\Rightarrow \sum_{X_{n}=n}^{c}\frac{\binom{X_{(n)-1}}{n-1}}{\binom{N_{0}}{n}}\geq 1-\alpha }}$
$\mathbf{\mathit{\Rightarrow\frac{\binom{c}{n}}{\binom{N_{0}}{n}}\geq 1-\alpha }}$

Interval Estimation:

A $\mathbf{100*(1-\alpha)\%}$ confidence interval for N is given as $\mathbf{\mathit{[X_{n},X_{n}+1,...,N_{u}]}}$ where $\mathbf{\mathit{N_{u}}}$ is the largest integer satisfying $\mathbf{\mathit{\frac{\binom{X_{(n)}}{n}}{\binom{N_{u}}{n}}\geq \alpha }}$ .

The Bayesian Approach:

Turning to a Bayesian framework, we assume a suitable prior distribution for N, after which the posterior distribution can be calculated via Bayes’ theorem, i.e.

$\mathbf{\mathit{P(N|X_{n})=\frac{P(X_{(n)}|N)P(N)}{P(X_{(n)})}=\frac{P(X_{(n)}|N)P(N)}{\sum_{N'=X_{(n)}}^{\infty }P(X_{(n)}|N')P(N')}}}$ for $\mathbf{\mathit{X_{n}\leq N< \infty }}$ and 0 , otherwise

Various choices can be imagined as prior distribution for N:

• An improper uniform prior on all positive integers, i.e.
$\mathbf{\mathit{P(N)\propto 1}}$ for $\mathbf{\mathit{N=0,1,...,\infty }}$ and 0 ,otherwise

• A proper uniform distribution with an upper limit k for N, i.e.
$\mathbf{\mathit{P(N)=\frac{1}{k+1}}}$ for $\mathbf{\mathit{0\leq N\leq k}}$ and 0, otherwise.

• A Geometric, Poisson or Negative Binomial distribution.

Using the Improper Uniform Prior:

Under the improper uniform prior, the posterior distribution is then given by,

$\mathbf{\mathit{P(N|X_{n})=\frac{\frac{n-1}{n}\binom{X_{n}}{n}}{\binom{N}{n}}}}$ if $\mathbf{\mathit{N=X_{n},X_{n}+1,...,\infty }}$ and 0, otherwise

which is a shifted factorial distribution.

The posterior distribution is extremely positively skewed. The posterior mode is at $\mathbf{\mathit{X_{(n)}}}$ .

The posterior mean is $\mathbf{\mathit{E(N|X_{(n)})=\frac{(n-1)(X_{(n)}-1)}{n-2}}}$ for $\mathbf{\mathit{n>2}}$

and the posterior variance $\mathbf{\mathit{V(N|X_{n})=\frac{(n-1)(X_{(n)}-1)(X_{(n)}-n+1)}{(n-2)^{2}(n-3)}}}$ for $\mathbf{\mathit{n>3}}$

Since the posterior distribution is extremely positive skewed, quantile measures are more appropriate than moment measures to summarize the posterior information. So the appropriate estimate of N in this case is the posterior median and the measure of accuracy of the estimate is the posterior quartile deviation.

Posterior Quantiles and Highest Posterior Density (HPD) intervals:

We now turn to the problem of calculating posterior quantiles. Let $\mathbf{\mathit{N_{q}}}$ be the q-quantile of the posterior i.e. the smallest integer satisfying $\mathbf{\mathit{\sum_{N'=X_{(n)}}^{N_{q}}P(X_{(n)}|N')=\sum_{N'=X_{(n)}}^{N_{q}}\frac{\frac{n-1}{n}\binom{X_{(n)}}{n}}{\binom{N}{n}}\geq q}}$

or equivalently $\mathbf{\mathit{\sum_{N'=N_{q}+1}^{\infty}P(X_{(n)}|N')=\sum_{N'=N_{q}+1}^{\infty }\frac{\frac{n-1}{n}\binom{X_{(n)}}{n}}{\binom{N}{n}}=\frac{(X_{n}-1)!(N_{q}-n+1)!}{(X_{(n)}-n)!N_{q}!}\leq 1-q}}$ .

This gives us a way to calculate the median and any posterior quantile directly, without explicitly summing up the posterior distribution. The polynomial to be solved for real $\mathbf{\mathit{\overline{N_{q}}}}$ is

$\mathbf{\mathit{\overline{N_{q}}(\overline{N_{q}}-1)...(\overline{N_{q}}-n+2)-\frac{(X_{(n)}-1)!}{(1-q)(X_{(n)}-1)!}=0}}$ where $\mathbf{\mathit{N_{q}=\left \lceil \bar{N_{q}}\right \rceil}}$ , the smallest integer larger than $\mathbf{\mathit{\overline{N_{q}}}}$ . Because the mode of the posterior distribution is always at $\mathbf{\mathit{X_{(n)}}}$ and the posterior distribution is monotone decreasing for increasing N, the computation of highest posterior density (HPD) intervals reduces to the computation of quantiles of the posterior, i.e. the HPD-interval of level q equals $\mathbf{\mathit{[X_{n},X_{n}+1,...,N_{q}]}}$ .

Results obtained:

We choose N to be 1000. We obtain the required simulated data and demonstrate these computations using R .The R code used for the simulations and computations has been adopted from the paper by .

We performed the necessary computations for sample sizes n=5,10,20,50,75 and 100. The resuls are summarized in the following table :

Comparison of the two approaches:

Here we see that both the frequentist and Bayesian estimates(using an improper uniform prior) are close to the true value of the parameter on an average. Also the variability in the estimates decreases as expected, in both approaches as the sample is increased .

However we observe that the Bayesian approach always gives a lower estimate than the corresponding frequentist estimate. Further, the variability in the estimates in the two approaches, measures by the standard error in the frequentist approach and the posterior quartile deviation in the Bayesian approach. Thus the point estimates provided by the Bayesian approach is better atleast in these case.

When comparing the interval estimates provided by the two approaches, it is to kept in mind that the interpretation of the results in two approaches are different. In the frequentist approach, a 95% shortest length confidence interval ensures that the given interval on average covers the true value of the parameter in 95 % repetitions of the underlying random experiment.

In the Bayesian approach, the 95% HPD interval gives the range of the posterior distribution of the parameter covering 95% of the total area under the curve such that every point in the interval has a higher mass / density that any point outside the interval , thus giving a range of plausible values of the parameter.

The interval estimates of N can be compared for the two approaches in terms of the length of the intervals.It is observed that the frequentist approach provides shorter intervals in this case.

Comparison of errors:

The errors(=estimated value - true value) for the different sample sizes in the two approaches are compared through the following graph :

We observe that the errors are almost same for the two approaches, since the points are lying very close to the y=x line.

Comparison with other priors:

We perform similar computations using the proper uniform prior(choosing the upper limit k to be 2000, a reasonable choice) and the negative binomial prior. We refer to the paper by for details.

The results obtained using these priors are compared with the estimates obtained using the improper uniform prior are summarized in the following table:

It can be seen that these priors perform worse than the improper uniform prior in this case in terms of point estimation.

Other applications:

The same formula was used to estimate the number of iphones sold. It was estimated that Apple had sold around 9.1 million phones to the end of September 2008.
It was also used to estimate the total number of taxi-cabs in New York city.

References:

Höhle, M.; Held, Leonhard (2006). "Bayesian Estimation of the Size of a Population" (PDF). Technical Report, SFB 386, No. 399, Department of Statistics, University of Munich. Retrieved 2016-04-17.
Goodman, L. A. (1954). "Some Practical Techniques in Serial Number Analysis". Journal of the American Statistical Association. American Statistical Association. 49 (265): 97–112. doi:10.2307/2281038. JSTOR 2281038.
Johnson, R. W. (Summer 1994). "Estimating the Size of a Population". Teaching Statistics. 16 (2): 50–52. doi:10.1111/j.1467-9639.1994.tb00688.x.
Ruggles, R. and Brodie, H. (1947), “An empirical approach to economic intelligence in World War II,” Journal of the American Statistical Association, 42, 72–91.

Search This Blog

Curious Minds