Real-world data often exhibit imbalanced label distributions. Existing studies on data imbalance focus on single-domain settings, i.e., samples are from the same data distribution. However, natural data can originate from distinct domains, where a minority class in one domain could have abundant instances from other domains. We formalize the task of Multi-Domain Long-Tailed Recognition (MDLT), which learns from multi-domain imbalanced data, addresses label imbalance, domain shift, and divergent label distributions across domains, and generalizes to all domain-class pairs. We first develop the domain-class transferability graph, and show that such transferability governs the success of learning in MDLT. We then propose BoDA, a theoretically grounded learning strategy that tracks the upper bound of transferability statistics, and ensures balanced alignment and calibration across imbalanced domain-class distributions. We curate five MDLT benchmarks based on widely-used multi-domain datasets, and compare BoDA to twenty algorithms that span different learning strategies. Extensive and rigorous experiments verify the superior performance of BoDA. Further, as a byproduct, BoDA establishes new state-of-the-art on Domain Generalization benchmarks, highlighting the importance of addressing data imbalance across domains, which can be crucial for improving generalization to unseen domains. Code and data are available at: https://github.com/YyzHarry/multi-domain-imbalance.
翻译:现实世界数据往往呈现不平衡的标签分布。关于数据不平衡的现有研究侧重于单一域设置,即样本来自相同的数据分布。然而,自然数据可以来自不同领域,一个域的少数类可以从其他域出现大量实例。我们正式确定多域长期失败识别(MDLT)的任务,该任务从多域不平衡数据中学习,处理标签不平衡、域转移和跨域不同标签分布,并概括到所有域级配对。我们首先开发域级可转移性图,并显示这种可转移性制约MDLT学习的成功。我们然后提议BODA,这是一个基于理论的学习战略,跟踪可转移性统计数据的上限,确保平衡地调整和校正不平衡的域级分布。我们根据广泛使用的多域数据集,将BODA与涵盖不同学习战略的20种算法进行比较。广泛而严格的实验可以验证BDA的优异性表现。此外,BODA作为产品,我们提出一个基于理论的学习模式的学习策略,BODA,一个基于理论的理论的学习策略,在通用域间建立新的州/中央数据库数据库。