Dealing with severe class imbalance poses a major challenge for real-world applications, especially when the accurate classification and generalization of minority classes is of primary interest. In computer vision, learning from long tailed datasets is a recurring theme, especially for natural image datasets. While existing solutions mostly appeal to sampling or weighting adjustments to alleviate the pathological imbalance, or imposing inductive bias to prioritize non-spurious associations, we take novel perspectives to promote sample efficiency and model generalization based on the invariance principles of causality. Our proposal posits a meta-distributional scenario, where the data generating mechanism is invariant across the label-conditional feature distributions. Such causal assumption enables efficient knowledge transfer from the dominant classes to their under-represented counterparts, even if the respective feature distributions show apparent disparities. This allows us to leverage a causal data inflation procedure to enlarge the representation of minority classes. Our development is orthogonal to the existing extreme classification techniques thus can be seamlessly integrated. The utility of our proposal is validated with an extensive set of synthetic and real-world computer vision tasks against SOTA solutions.
翻译:处理严重的阶级不平衡问题给现实世界应用带来了重大挑战,特别是当对少数群体类的准确分类和一般化具有首要兴趣时。在计算机愿景中,从长尾数据集学习是一个反复出现的主题,特别是自然图像数据集。虽然现有解决方案大多需要抽样或加权调整,以缓解病态失衡,或强加诱导偏差,以优先处理非纯性协会,但我们从新角度出发,根据因果关系原则促进抽样效率和模式的概括化。我们的提案提出了一种元分布设想,即生成数据的机制在标签条件特征分布之间是无差异的。这种因果关系假设使得主导类知识能够有效地转移给代表不足的对应方,即使各自的特征分布显示出明显的差异。这使我们能够利用因果数据膨胀程序来扩大少数群体类的代表性。因此,我们的发展与现有的极端分类技术是无缝的。我们提案的效用得到了一套针对SOTA解决方案的合成和现实世界计算机愿景的广泛任务验证。