The per-pixel cross-entropy loss (CEL) has been widely used in structured output prediction tasks as a spatial extension of generic image classification. However, its i.i.d. assumption neglects the structural regularity present in natural images. Various attempts have been made to incorporate structural reasoning mostly through structure priors in a cooperative way where co-occuring patterns are encouraged. We, on the other hand, approach this problem from an opposing angle and propose a new framework for training such structured prediction networks via an adversarial process, in which we train a structure analyzer that provides the supervisory signals, the adversarial structure matching loss (ASML). The structure analyzer is trained to maximize ASML, or to exaggerate recurring structural mistakes usually among co-occurring patterns. On the contrary, the structured output prediction network is trained to reduce those mistakes and is thus enabled to distinguish fine-grained structures. As a result, training structured output prediction networks using ASML reduces contextual confusion among objects and improves boundary localization. We demonstrate that ASML outperforms its counterpart CEL especially in context and boundary aspects on figure-ground segmentation and semantic segmentation tasks with various base architectures, such as FCN, U-Net, DeepLab, and PSPNet.
翻译:在结构化产出预测任务中,作为通用图像分类的空间扩展,对结构化产出预测任务(CEL)已广泛使用每像素交叉作物损失(CEL),作为通用图像分类的空间扩展。然而,其i.d. 假设忽略了自然图像中存在的结构性规律性。已作出各种努力,主要通过结构前期将结构推理纳入结构推理,鼓励共同形成模式。另一方面,我们从对立的角度处理这一问题,并提议一个新的框架,通过对抗性进程培训这种结构化预测网络,其中我们培训一个结构分析器,提供监督信号,即对抗性结构匹配损失(ASML)。结构分析器受过培训,以尽量扩大ASML,或避免通常在共同形成模式中反复出现的结构错误。相反,结构化产出预测网络受到培训,以减少这些错误,从而能够区分细化的结构结构。结果是,利用ASML培训结构化的输出预测网络,减少物体之间的背景混淆,改进边界定位。我们证明,ASML在其对应的CEL上,特别是在背景和边界方面,例如深层地段结构、PSP和断段等任务。