Significant effort has been recently devoted to modeling visual relations. This has mostly addressed the design of architectures, typically by adding parameters and increasing model complexity. However, visual relation learning is a long-tailed problem, due to the combinatorial nature of joint reasoning about groups of objects. Increasing model complexity is, in general, ill-suited for long-tailed problems due to their tendency to overfit. In this paper, we explore an alternative hypothesis, denoted the Devil is in the Tails. Under this hypothesis, better performance is achieved by keeping the model simple but improving its ability to cope with long-tailed distributions. To test this hypothesis, we devise a new approach for training visual relationships models, which is inspired by state-of-the-art long-tailed recognition literature. This is based on an iterative decoupled training scheme, denoted Decoupled Training for Devil in the Tails (DT2). DT2 employs a novel sampling approach, Alternating Class-Balanced Sampling (ACBS), to capture the interplay between the long-tailed entity and predicate distributions of visual relations. Results show that, with an extremely simple architecture, DT2-ACBS significantly outperforms much more complex state-of-the-art methods on scene graph generation tasks. This suggests that the development of sophisticated models must be considered in tandem with the long-tailed nature of the problem.
翻译:最近为模拟视觉关系作出了重大努力。 这主要涉及建筑结构的设计, 通常是通过增加参数和增加模型复杂性。 但是, 视觉关系学习是一个长期的问题, 这是因为对物体群进行联合推理的组合性质。 一般来说, 日益复杂的模型不适合长期处理的问题, 因为它们往往过于适配。 在本文中, 我们探索了另一种假设, 指出魔鬼在尾部。 在这个假设下, 通过保持模型的简单化, 提高了其应付长期尾细分布的能力, 实现了更好的业绩。 为了测试这一假设, 我们设计了一种培训视觉关系模型的新方法, 这是由最先进的长期识别文献所启发的。 这基于迭接的分解性培训计划, 上面写了对尾部魔鬼的分解性培训( DT2 ) 。 DT2 采用了一种新型的抽样方法, 调和 分类- 升级- 火化( ACBS), 以便捕捉到长尾部实体和直径2 直径的直径分布。 结果显示视觉关系的直径分布, 以非常简单的平面结构显示一个非常简单的模型。