Most dense recognition methods bring a separate decision in each particular pixel. This approach still delivers competitive performance in usual closed-set setups with small taxonomies. However, important applications in the wild typically require strong open-set performance and large numbers of known classes. We show that these two demanding setups greatly benefit from mask-level predictions, even in the case of non-finetuned baseline models. Moreover, we propose an alternative formulation of dense recognition uncertainty that effectively reduces false positive responses at semantic borders. The proposed formulation produces a further improvement over a very strong baseline and sets the new state of the art in dense anomaly detection without training on negative data. Our contributions also lead to a performance improvement in a recent open-set panoptic setup. In-depth experiments confirm that our approach succeeds due to implicit aggregation of pixel-level cues into mask-level predictions.
翻译:最密集的识别方法在每一个特定的像素中产生单独的决定。 这个方法仍然在通常的封闭式、有小分类的分类设置中提供竞争性的性能。 但是,野生的重要应用通常要求很强的开放性功能和大量已知的等级。 我们表明,这两个要求很高的设置极大地受益于掩罩级预测,即使是非不精确的基线模型也是如此。 此外,我们提议了一种密集的识别不确定性的替代提法,以有效减少语义边界上的虚假正面反应。 拟议的提法使得一个非常强大的基线得到进一步的改进,并在没有关于负面数据的培训的情况下,在密集的异常探测中建立了新的状态。我们的贡献还导致最近一个开放的全景层结构的性改进。深入的实验证实,由于将像素级的线索隐含在遮蔽级预测中,我们的方法取得了成功。