Factorizable joint shift (FJS) was recently proposed as a type of dataset shift for which the complete characteristics can be estimated from feature data observations on the test dataset by a method called Joint Importance Aligning. For the multinomial (multiclass) classification setting, we derive a representation of factorizable joint shift in terms of the source (training) distribution, the target (test) prior class probabilities and the target marginal distribution of the features. On the basis of this result, we propose alternatives to joint importance aligning and, at the same time, point out that factorizable joint shift is not fully identifiable if no class label information on the test dataset is available and no additional assumptions are made. Other results of the paper include correction formulae for the posterior class probabilities both under general dataset shift and factorizable joint shift. In addition, we investigate the consequences of assuming factorizable joint shift for the bias caused by sample selection.
翻译:最近提议可实现的联合转换(FJS)是一种数据集转换类型,其完整特性可从测试数据集的特征数据观测中通过一种称为“联合重要性对齐”的方法对测试数据集的特征数据进行估计。对于多数值(多等级)分类设置,我们得出一个可实现的组合变化的表示,即源(培训)分布、目标(测试)前等级概率和特征目标边际分布。根据这一结果,我们提出联合重要性对齐的替代方法,同时指出,如果测试数据集上没有分类标签信息,而且没有作出其他假设,则无法完全确定可实现的参数联合转移。本文的其他结果包括:在一般数据集变化和可系数联合变化下对后级类别概率的校正公式。此外,我们调查假设因抽样选择造成的偏差而实现的可实现的系数联合变化的后果。