There is an inescapable long-tailed class-imbalance issue in many real-world classification problems. Existing long-tailed classification methods focus on the single-domain setting, where all examples are drawn from the same distribution. However, real-world scenarios often involve multiple domains with distinct imbalanced class distributions. We study this multi-domain long-tailed learning problem and aim to produce a model that generalizes well across all classes and domains. Towards that goal, we introduce TALLY, which produces invariant predictors by balanced augmenting hidden representations over domains and classes. Built upon a proposed selective balanced sampling strategy, TALLY achieves this by mixing the semantic representation of one example with the domain-associated nuisances of another, producing a new representation for use as data augmentation. To improve the disentanglement of semantic representations, TALLY further utilizes a domain-invariant class prototype that averages out domain-specific effects. We evaluate TALLY on four long-tailed variants of classical domain generalization benchmarks and two real-world imbalanced multi-domain datasets. The results indicate that TALLY consistently outperforms other state-of-the-art methods in both subpopulation shift and domain shift.
翻译:在许多现实世界的分类问题中,存在着一个无法避免的长尾长尾类平衡问题。现有的长尾类分类方法侧重于单一域设置,所有实例都来自同一分布。然而,现实世界情景往往涉及多个领域,有明显的不平衡类分布。我们研究了这个多多侧长尾类学习问题,目的是产生一个在所有类别和领域都非常普遍的模型。为了实现这一目标,我们引入了TUI,通过平衡地扩大对域和类的隐藏表达方式,产生不易预测值。基于一个拟议的选择性均衡抽样战略,TUI通过将一个实例的语义表达方式与另一个与与域相关的模糊性相混合来实现这一点,产生一种新的表达方式,作为数据增强使用。为了改善语义表达方式的脱钩,TALE进一步使用一个平均排除特定域效应的域异性类原型原型。我们从四个长期的典型域通用基准变量和两个实际世界不平衡的多端位数据结构变化中评估了TALY。结果显示,另外一种状态是变化的次域。