A common challenge across all areas of machine learning is that training data is not distributed like test data, due to natural shifts, "blind spots," or adversarial examples; such test examples are referred to as out-of-distribution (OOD) test examples. We consider a model where one may abstain from predicting, at a fixed cost. In particular, our transductive abstention algorithm takes labeled training examples and unlabeled test examples as input, and provides predictions with optimal prediction loss guarantees. The loss bounds match standard generalization bounds when test examples are i.i.d. from the training distribution, but add an additional term that is the cost of abstaining times the statistical distance between the train and test distribution (or the fraction of adversarial examples). For linear regression, we give a polynomial-time algorithm based on Celis-Dennis-Tapia optimization algorithms. For binary classification, we show how to efficiently implement it using a proper agnostic learner (i.e., an Empirical Risk Minimizer) for the class of interest. Our work builds on a recent abstention algorithm of Goldwasser, Kalais, and Montasser (2020) for transductive binary classification.
翻译:在机器学习的所有领域,一个共同的挑战是,由于自然变化、“盲点”或对抗性实例,培训数据不像测试数据那样分发,因为自然变化、“盲点”或对抗性实例;这类测试示例被称为“分配之外(OOOD)”的测试示例。我们考虑一种模式,人们可以不以固定成本预测。特别是,我们的转录性弃权算法将标记的培训示例和未标记的测试示例作为输入,并提供最佳预测损失保证。如果测试示例是培训分布的一.d,则损失界限与标准一般化界限相匹配,但增加一个额外的术语,即列车和测试分布之间的统计距离(或对抗性示例的一小部分)的放弃次数。对于线性回归,我们给出了一个基于Celis-Dennis-Tapia优化算法的多元时间算法。对于二元分类,我们展示了如何高效地执行该算法,使用一个适当的敏感学习者(即Empical风险最小化者)作为利益类别。我们的工作以Goldwaser、Kalis、Kal和Monstal20的最近递定值算法为基础(20)。