During the Second World War, estimates of the number of tanks deployed by Germany were critically needed. The Allies adopted two methods to estimate this information: espionage and statistical analysis. The latter approach was far more successful and is as follows: assuming that the tanks are sequentially numbered starting from 1, if we observe $k$ serial numbers from an unknown total of $N$ tanks, with the highest observed number being $M$, then the best linear unbiased estimator for $N$ is $M(1+1/k)-1$. This is now known as the German Tank Problem. Suppose one wishes to estimate the productivity of a rival by inspecting captured or destroyed tanks, each with a unique serial number. In many situations, the original German Tank Problem is insufficient, since typically there are $l>1$ factories, and tanks produced by different factories may have serial numbers in disjoint ranges that are often far separated, let alone sequentially numbered starting from 1. We wish to estimate the total tank production across all of the factories. We construct an efficient procedure to estimate the total productivity and prove that our procedure effectively estimates $N$ when $\log l/\log k$ is sufficiently small, and is robust against both large and small gaps between factories. In the final section, we show that given information about the gaps, we can make a far better estimator that is also effective when we have a small number of samples. When the number of samples is small compared to the number of gaps, the Mean Squared Error of this new estimator is several orders of magnitude smaller than the one that assumes no information. This quantifies the importance of hiding such information if one wishes to conceal their productivity from a rival.
翻译:暂无翻译