Reference Expression Segmentation (RES) and Reference Expression Generation (REG) are mutually inverse tasks that can be naturally jointly trained. Though recent work has explored such joint training, the mechanism of how RES and REG can benefit each other is still unclear. In this paper, we propose a unified mutual supervision framework that enables two tasks to improve each other. Our mutual supervision contains two directions. On the one hand, Disambiguation Supervision leverages the expression unambiguity measurement provided by RES to enhance the language generation of REG. On the other hand, Generation Supervision uses expressions automatically generated by REG to scale up the training of RES. Such unified mutual supervision effectively improves two tasks by solving their bottleneck problems. Extensive experiments show that our approach significantly outperforms all existing methods on REG and RES tasks under the same setting, and detailed ablation studies demonstrate the effectiveness of all components in our framework.
翻译:虽然最近的工作探索了这种联合培训,但关于RES和REEG如何相互受益的机制仍然不明确。在本文件中,我们提议一个统一的相互监督框架,使两项任务能够相互改进。我们的相互监督包括两个方向。一方面,差异监督利用RES提供的表达方式不矛盾的衡量方法来增强REG的语言生成。另一方面,生成监督使用REG自动生成的表达方式来扩大REG的培训。这种统一的相互监督通过解决其瓶颈问题,有效地改进了两项任务。广泛的实验表明,我们的方法大大超越了在同一背景下现有所有REG方法和RES任务,详细的对比研究表明了我们框架内所有组成部分的有效性。