Identifying unexpected objects on roads in semantic segmentation (e.g., identifying dogs on roads) is crucial in safety-critical applications. Existing approaches use images of unexpected objects from external datasets or require additional training (e.g., retraining segmentation networks or training an extra network), which necessitate a non-trivial amount of labor intensity or lengthy inference time. One possible alternative is to use prediction scores of a pre-trained network such as the max logits (i.e., maximum values among classes before the final softmax layer) for detecting such objects. However, the distribution of max logits of each predicted class is significantly different from each other, which degrades the performance of identifying unexpected objects in urban-scene segmentation. To address this issue, we propose a simple yet effective approach that standardizes the max logits in order to align the different distributions and reflect the relative meanings of max logits within each predicted class. Moreover, we consider the local regions from two different perspectives based on the intuition that neighboring pixels share similar semantic information. In contrast to previous approaches, our method does not utilize any external datasets or require additional training, which makes our method widely applicable to existing pre-trained segmentation models. Such a straightforward approach achieves a new state-of-the-art performance on the publicly available Fishyscapes Lost & Found leaderboard with a large margin.
翻译:在安全关键应用中,在道路的语义分隔部分(例如,在道路上识别狗)中识别意外物体至关重要。现有方法使用外部数据集中意外物体的图像,或需要额外培训(例如,再培训分离网络或培训额外的网络),这就需要非三重劳动强度或漫长的推算时间。一种可能的替代办法是使用诸如最高斜线(即,在最后软性层之前各班级的最大值)等预先培训网络的预测分数来探测此类物体。然而,每个预测类的最大登录点的分布与其它不同,这大大不同,降低了在城市封闭部分中查明意外物体的性能。为了解决这一问题,我们建议一种简单而有效的方法,使最大斜线标准化,以便协调不同的分布,反映每个预测类中最大斜线的相对含义。此外,我们从两个不同的角度来考虑当地区域的情况,即邻接像体共享相似的语义分。与以往不同,我们的方法不同,这降低了在城市封闭区分割部分中发现意外物体的性能。我们的方法并不使用任何可应用的直观的直观性计算方法,因此,在现有的直观模型前,因此,我们可以使用任何可应用任何可应用的直径分析方法。