Contemporary segmentation methods are usually based on deep fully convolutional networks (FCNs). However, the layer-by-layer convolutions with a growing receptive field is not good at capturing long-range contexts such as lane markers in the scene. In this paper, we address this issue by designing a distillation method that exploits label structure when training segmentation network. The intuition is that the ground-truth lane annotations themselves exhibit internal structure. We broadcast the structure hints throughout a teacher network, i.e., we train a teacher network that consumes a lane label map as input and attempts to replicate it as output. Then, the attention maps of the teacher network are adopted as supervisors of the student segmentation network. The teacher network, with label structure information embedded, knows distinctly where the convolution layers should pay visual attention into. The proposed method is named as Label-guided Attention Distillation (LGAD). It turns out that the student network learns significantly better with LGAD than when learning alone. As the teacher network is deprecated after training, our method do not increase the inference time. Note that LGAD can be easily incorporated in any lane segmentation network.
翻译:当今的分割方法通常基于深度全卷积网络(FCN)。然而,层层卷积的增大感受野不擅长捕捉场景中的远距离上下文,例如车道标记。本文通过设计一种蒸馏方法来解决这个问题,该方法在训练分割网络时利用标签结构。我们的直觉是,单独的车道标注本身展示了内部结构。我们将结构提示广播到教师网络中,即我们训练一个将车道标签映射作为输入并试图将其复制为输出的教师网络。然后,教师网络的注意力地图被采用作为学生分割网络的监督器。嵌入标签结构信息的教师网络清楚地知道哪些卷积层应该关注。所提出的方法称为基于标签引导的注意力蒸馏(LGAD)。实验证明,相较于单独学习,学生网络使用LGAD显著地提高。由于教师网络训练后被放弃,因此我们的方法不会增加推断时间。请注意,LGAD可以轻松地融入任何车道分割网络中。