Nowadays, many visual scene understanding problems are addressed by dense prediction networks. But pixel-wise dense annotations are very expensive (e.g., for scene parsing) or impossible (e.g., for intrinsic image decomposition), motivating us to leverage cheap point-level weak supervision. However, existing pointly-supervised methods still use the same architecture designed for full supervision. In stark contrast to them, we propose a new paradigm that makes predictions for point coordinate queries, as inspired by the recent success of implicit representations, like distance or radiance fields. As such, the method is named as dense prediction fields (DPFs). DPFs generate expressive intermediate features for continuous sub-pixel locations, thus allowing outputs of an arbitrary resolution. DPFs are naturally compatible with point-level supervision. We showcase the effectiveness of DPFs using two substantially different tasks: high-level semantic parsing and low-level intrinsic image decomposition. In these two cases, supervision comes in the form of single-point semantic category and two-point relative reflectance, respectively. As benchmarked by three large-scale public datasets PASCALContext, ADE20K and IIW, DPFs set new state-of-the-art performance on all of them with significant margins. Code can be accessed at https://github.com/cxx226/DPF.
翻译:现今,密集预测网络被广泛运用于许多视觉场景理解问题。但是,像场景分割这样的像素级密集注释非常昂贵,或者像内在图像分解这样的任务中,则不可能使用像素级别的标注。这激励我们使用低成本的点级弱监督。然而,现有的以点为监督方式的方法仍然使用为全监督方式设计的相同方法。与之形成鲜明对比的是,我们提出了一种新的范例,即生成针对点坐标查询的预测,受到距离或光辐射场等隐式表示法的成功启发。因此,该方法被命名为密集预测场(DPF)。DPF为连续的亚像素位置生成具有表达力的中间特征,从而允许输出任意分辨率。DPF自然地兼容于点级监督。我们展示了DPF的有效性,使用了两个截然不同的任务:高层次语义分割和低层次内在图像分解。在这两种情况下,监督以单点语义类别和两点相对反射率的形式出现。通过对PASCALContext、ADE20K和IIW三个大规模公共数据集进行基准测试,DPF在所有这些数据集上均实现了新的最先进性能,并且具有显著的优势。代码可通过 https://github.com/cxx226/DPF 访问。