The term dataset shift refers to the situation where the data used to train a machine learning model is different from where the model operates. While several types of shifts naturally occur, existing shift detectors are usually designed to address only a specific type of shift. We propose a simple yet powerful technique to ensemble complementary shift detectors, while tuning the significance level of each detector's statistical test to the dataset. This enables a more robust shift detection, capable of addressing all different types of shift, which is essential in real-life settings where the precise shift type is often unknown. This approach is validated by a large-scale statistically sound benchmark study over various synthetic shifts applied to real-world structured datasets.
翻译:数据集转换一词是指用于培训机器学习模型的数据与模型运行地点不同的情况。虽然有几种类型的转移自然发生,但现有的转移探测器通常只设计针对特定类型的转移。我们建议一种简单而有力的技术来混合互补的转移探测器,同时将每个探测器的统计测试的重要性调整到数据集。这样就可以进行更强有力的转移检测,能够处理所有不同类型的转移,这在精确的转移类型往往不为人所知的实际情况环境中是必不可少的。对于适用于真实世界结构数据集的各种合成转移进行大规模、统计上健全的基准研究证实了这一方法。