During World War II the German army used tanks to devastating advantage. The Allies needed accurate estimates of their tank production and deployment. They used two approaches to find these values: spies, and statistics. This note describes the statistical approach. Assuming the tanks are labeled consecutively starting at 1, if we observe $k$ serial numbers from an unknown number $N$ of tanks, with the maximum observed value $m$, then the best estimate for $N$ is $m(1 + 1/k) - 1$. This is now known as the German Tank Problem, and is a terrific example of the applicability of mathematics and statistics in the real world. The first part of the paper reproduces known results, specifically deriving this estimate and comparing its effectiveness to that of the spies. The second part presents a result we have not found in print elsewhere, the generalization to the case where the smallest value is not necessarily 1. We emphasize in detail why we are able to obtain such clean, closed-form expressions for the estimates, and conclude with an appendix highlighting how to use this problem to teach regression and how statistics can help us find functional relationships.
翻译:在二战期间,德国军队用坦克来达到破坏性优势。盟军需要准确估计其坦克的产量和部署量。它们需要两种方法来找到这些价值:间谍和统计。本说明描述了统计方法。假设坦克的序号从1开始连续贴上标签,如果我们从一个未知的坦克数量中看到1美元序列号,其最高值为1美元,最高值为1美元,那么最好的估计值是1美元(1+1/k)-1美元。现在称为德国坦克问题,这是数学和统计在现实世界中的适用性的一个极好的例子。文件第一部分转载了已知的结果,具体得出这一估计,并将其与间谍的效力进行比较。第二部分介绍了我们在其他地方没有看到的结果,在最小值不一定为1的情况下,一般化为案例。我们详细强调为什么我们能够获得这种清洁的、封闭式的估计数表达方式。我们最后用一个附录来强调如何使用这一问题来教导倒退,以及统计如何帮助我们找到功能关系。