Machine learning algorithms typically assume that the training and test samples come from the same distributions, i.e., in-distribution. However, in open-world scenarios, streaming big data can be Out-Of-Distribution (OOD), rendering these algorithms ineffective. Prior solutions to the OOD challenge seek to identify invariant features across different training domains. The underlying assumption is that these invariant features should also work reasonably well in the unlabeled target domain. By contrast, this work is interested in the domain-specific features that include both invariant features and features unique to the target domain. We propose a simple yet effective approach that relies on correlations in general regardless of whether the features are invariant or not. Our approach uses the most confidently predicted samples identified by an OOD base model (teacher model) to train a new model (student model) that effectively adapts to the target domain. Empirical evaluations on benchmark datasets show that the performance is improved over the SOTA by ~10-20%
翻译:机器学习算法通常假定培训和测试样本来自相同的分布,即分布。然而,在开放世界的情景中,流出的大数据可能是流出分布的(OOOD),使得这些算法无效。在OOOD挑战的解决方案之前,试图查明不同培训领域的不变化特征。基本假设是,这些不变化特征在未标注的目标域中也应合理地发挥作用。相反,这项工作对包括目标域独有的不变化特征和特征的域特定特征都感兴趣。我们提出了一个简单而有效的方法,它依靠的是一般的相互关系,而不论这些特征是否不变化。我们的方法使用OOOD基准模型(教师模型)所查明的最有信心预测的样本来训练一个能够有效适应目标域的新模型(学生模型)。基准数据集的实证评估表明,相对于SOTA的性能提高了~10-20 %。