Deep segmentation neural networks require large training datasets with pixel-wise segmentations, which are expensive to obtain in practice. Mixed supervision could mitigate this difficulty, with a small fraction of the data containing complete pixel-wise annotations, while the rest being less supervised, e.g., only a handful of pixels are labeled. In this work, we propose a dual-branch architecture, where the upper branch (teacher) receives strong annotations, while the bottom one (student) is driven by limited supervision and guided by the upper branch. In conjunction with a standard cross-entropy over the labeled pixels, our novel formulation integrates two important terms: (i) a Shannon entropy loss defined over the less-supervised images, which encourages confident student predictions at the bottom branch; and (ii) a Kullback-Leibler (KL) divergence, which transfers the knowledge from the predictions generated by the strongly supervised branch to the less-supervised branch, and guides the entropy (student-confidence) term to avoid trivial solutions. Very interestingly, we show that the synergy between the entropy and KL divergence yields substantial improvements in performances. Furthermore, we discuss an interesting link between Shannon-entropy minimization and standard pseudo-mask generation and argue that the former should be preferred over the latter for leveraging information from unlabeled pixels. Through a series of quantitative and qualitative experiments, we show the effectiveness of the proposed formulation in segmenting the left-ventricle endocardium in MRI images. We demonstrate that our method significantly outperforms other strategies to tackle semantic segmentation within a mixed-supervision framework. More interestingly, and in line with recent observations in classification, we show that the branch trained with reduced supervision largely outperforms the teacher.
翻译:深度分割神经网络需要大量的培训数据集, 其数量偏差值在实践上是昂贵的。 混合监督可以缓解这一困难, 有一小部分数据包含完整的像素说明, 而其他的则没有那么受到监督, 例如, 只有少数像素被贴上标签。 在这项工作中, 我们提出一个双部门架构, 由上分支( 教师) 接收强烈的注释, 而下分支( 学生) 则由有限的监督驱动, 并由上分支 指导。 结合标签像素的标准化交叉成份, 我们的新结构将两个重要术语整合在一起 :( 一) 香农变色丢失, 由不那么受监督的图像来定义, 而其他像样结构( 教师) 的上部( 教师) 接收强烈的注释, 而下部( 学生) 则在上层结构中传递知识, 并且用不甚精细的图像( 测试) 术语来避免微不足道的最近观测结果 。 有趣的是, 我们展示了后级变变的系统 。