检验选择偏见对普遍化的影响:思考实验 (Probing the Effect of Selection Bias on Generalization: A Thought Experiment)

Learned systems in the domain of visual recognition and cognition impress in part because even though they are trained with datasets many orders of magnitude smaller than the full population of possible images, they exhibit sufficient generalization to be applicable to new and previously unseen data. Since training data sets typically represent small sampling of a domain, the possibility of bias in their composition is very real. But what are the limits of generalization given such bias, and up to what point might it be sufficient for a real problem task? Although many have examined issues regarding generalization, this question may require examining the data itself. Here, we focus on the characteristics of the training data that may play a role. Other disciplines have grappled with these problems, most interestingly epidemiology, where experimental bias is a critical concern. The range and nature of data biases seen clinically are really quite relatable to learned vision systems. One obvious way to deal with bias is to ensure a large enough training set, but this might be infeasible for many domains. Another approach might be to perform a statistical analysis of the actual training set, to determine if all aspects of the domain are fairly captured. This too is difficult, in part because the full set of variables might not be known, or perhaps not even knowable. Here, we try a different approach in the tradition of the Thought Experiment, whose most famous instance may be Schr\"odinger's Cat. There are many types of bias as will be seen, but we focus only on one, selection bias. The point of the thought experiment is not to demonstrate problems with all learned systems. Rather, this might be a simple theoretical tool to probe into bias during data collection to highlight deficiencies that might then deserve extra attention either in data collection or system development.

翻译：视觉识别和认知的学习系统在视觉识别和认知的领域中有一定的印象,因为即使它们受过与可能图像的完整成份相比数量小于许多数量级的数据集的培训,它们也表现出足够的概括性,足以适用于新的和先前不为人知的数据。由于培训数据集通常代表一个域的小规模抽样,因此其构成的偏差可能性是非常真实的。但是,由于这种偏差,一般化的限度是什么?但到什么程度,对于真正的问题任务来说可能就足够了?虽然许多人已经审查了关于概括化的问题,但这个问题可能要求研究数据本身。在这里,我们侧重于可能发挥作用的培训数据的数据的特征。其他学科已经与这些问题进行了探讨,最有趣的是流行病学,而实验性偏向于此,实验性偏向于此,临床偏向的分布和性质与学习视觉系统非常相近。处理偏见的一个明显的方法是确保足够多的训练设置,但对于许多领域来说,这也许不可行。另一种方法可能是对实际训练的缺陷进行统计分析,以确定域内的所有方面是否都得到了准确的了解。这或许是,在理论上的某一类的变式的变数,也许要尝试中的变数。