Machine learning systems may encounter unexpected problems when the data distribution changes in the deployment environment. A major reason is that certain combinations of domains and labels are not observed during training but appear in the test environment. Although various invariance-based algorithms can be applied, we find that the performance gain is often marginal. To formally analyze this issue, we provide a unique algebraic formulation of the combination shift problem based on the concepts of homomorphism, equivariance, and a refined definition of disentanglement. The algebraic requirements naturally derive a simple yet effective method, referred to as equivariant disentangled transformation (EDT), which augments the data based on the algebraic structures of labels and makes the transformation satisfy the equivariance and disentanglement requirements. Experimental results demonstrate that invariance may be insufficient, and it is important to exploit the equivariance structure in the combination shift problem.
翻译:当部署环境中的数据分布变化时,机器学习系统可能会遇到意外问题。一个主要原因是,在培训期间没有观察到某些域和标签的组合,而是出现在测试环境中。虽然可以应用各种基于变化的算法,但我们发现,性能收益往往微不足道。为了正式分析这一问题,我们根据同质性、等异性的概念和分解的精细定义,为混合转移问题提供了独特的代数配方。代数要求自然产生一种简单而有效的方法,称为等异性分解变异(EDT),它增加了基于标签代数结构的数据,并使变异满足了等异性和分解要求。实验结果表明,异性可能不够,在组合转移问题中利用等异性结构十分重要。