Statistical samples under (1) Unconfounded growth preserve estimators' ability to determine the independent effects of their individual variables on any measurements (and lead, therefore, to fair and interpretable black-box predictions). Samples under (2) Externally-Valid growth preserve their ability to make predictions that generalize across out-of-sample variation. The first promotes predictions that generalize over populations, the second over their shared exogeneous factors. We illustrate these theoretic patterns in the full, and spatially-localized, American census from 1840 to 1940, and samples ranging from the street-level all the way to the national for 60 thousand different locations. The resulting Binomial-Exponential sample size requirements for generalizability over space reveals connections among the Shapley value, U-Statistics (Unbiased Statistics), and Hyperbolic Geometry in spatial systems.
翻译:根据(1) 无根据增长的统计样本保留了估计者确定其个别变量对任何测量(从而导致公平和可解释的黑盒预测)的独立影响的能力。(2)外部有效增长的样本保留了其作出预测的能力,这种预测贯穿于所有抽样变异的范围。第一个样本促进对人口进行总体预测,第二个样本超越其共同的外源因素。我们展示了1840年至1940年美国全面、空间定位的人口普查中的这些理论模式,以及从街道到全国6万个不同地点的样本。由此产生的对空间可通用性Binomial-Explical抽样规模的要求揭示了空间系统在Shapley值、U-Statistics(不偏向统计)和超偏向大地测量方面的连接。