We propose a method to identify and characterize distribution shifts in classification datasets based on optimal transport. It allows the user to identify the extent to which each class is affected by the shift, and retrieves corresponding pairs of samples to provide insights on its nature. We illustrate its use on synthetic and natural shift examples. While the results we present are preliminary, we hope that this inspires future work on interpretable methods for analyzing distribution shifts.
翻译:我们建议了一种方法,用以根据最佳运输确定和描述分类数据集的分布变化,使用户能够确定每个类别受该变化影响的程度,并检索对应的样本,以提供对其性质的洞察力。我们用合成和自然变化实例来说明其使用情况。虽然我们介绍的结果是初步的,但我们希望这能激发今后关于可解释的分布变化分析方法的工作。