We present our submission to the semantic segmentation contest of the Robust Vision Challenge held at ECCV 2020. The contest requires submitting the same model to seven benchmarks from three different domains. Our approach is based on the SwiftNet architecture with pyramidal fusion. We address inconsistent taxonomies with a single-level 193-dimensional softmax output. We strive to train with large batches in order to stabilize optimization of a hard recognition problem, and to favour smooth evolution of batchnorm statistics. We achieve this by implementing a custom backward step through log-sum-prob loss, and by using small crops before freezing the population statistics. Our model ranks first on the RVC semantic segmentation challenge as well as on the WildDash 2 leaderboard. This suggests that pyramidal fusion is competitive not only for efficient inference with lightweight backbones, but also in large-scale setups for multi-domain application.
翻译:我们向ECCV 2020 举办的强力愿景挑战的语义分解竞赛提交了我们的文件。 这场竞赛要求将同一模型提交到三个不同领域的7个基准中。 我们的方法是以具有金字塔融合的SwiftNet结构为基础。 我们处理单级193维软体积输出的不一致的分类。 我们努力与大批量培训,以稳定硬度识别问题的优化,并有利于批量统计的顺利演变。 我们通过在冻结人口统计之前使用小作物来做到这一点。 我们的模式首先排在 RVC 语义分解挑战以及野生达什 2 领先板上。 这意味着,金字塔融合不仅具有竞争力,而且具有与轻质脊椎的高效推断,而且具有竞争力。