Covariate shifts are a common problem in predictive modeling on real-world problems. This paper proposes addressing the covariate shift problem by minimizing Maximum Mean Discrepancy (MMD) statistics between the training and test sets in either feature input space, feature representation space, or both. We designed three techniques that we call MMD Representation, MMD Mask, and MMD Hybrid to deal with the scenarios where only a distribution shift exists, only a missingness shift exists, or both types of shift exist, respectively. We find that integrating an MMD loss component helps models use the best features for generalization and avoid dangerous extrapolation as much as possible for each test sample. Models treated with this MMD approach show better performance, calibration, and extrapolation on the test set.
翻译:共变式变化是真实世界问题预测模型中常见的问题。 本文建议通过在地物输入空间、特征代表空间或两者中尽量减少培训和测试组之间最大平均值差异( MMD)的统计,解决共变式转移问题。 我们设计了三种技术,我们称之为 MMD 代表、 MMD Mask 和 MMD 混合技术,以应对只有分布变化存在、只有缺失变化存在或存在两种类型的转移的情景。 我们发现,整合 MMD 损失部分有助于模型使用最佳的通用特征,并尽可能避免每个测试样本出现危险的外推法。 使用MMD 方法处理的模型显示测试集的性能、校准和外推法更好。