Modelling long-range dependencies is critical for scene understanding tasks in computer vision. Although CNNs have excelled in many vision tasks, they are still limited in capturing long-range structured relationships as they typically consist of layers of local kernels. A fully-connected graph is beneficial for such modelling, however, its computational overhead is prohibitive. We propose a dynamic graph message passing network, that significantly reduces the computational complexity compared to related works modelling a fully-connected graph. This is achieved by adaptively sampling nodes in the graph, conditioned on the input, for message passing. Based on the sampled nodes, we dynamically predict node-dependent filter weights and the affinity matrix for propagating information between them. Using this model, we show significant improvements with respect to strong, state-of-the-art baselines on three different tasks and backbone architectures. Our approach also outperforms fully-connected graphs while using substantially fewer floating-point operations and parameters. The project website is http://www.robots.ox.ac.uk/~lz/dgmn/
翻译:模拟长距离依赖性对于计算机视觉的现场理解任务至关重要。尽管CNN在很多视觉任务中表现得非常出色,但它们在捕捉长距离结构关系方面仍然有限,因为这些关系通常由局部内核层组成。一个完全连接的图表有利于这种建模,然而,其计算性间接费用却令人望而却步。我们提出了一个动态图形传递信息网络,与相关工作建模完全连接的图相比,这大大降低了计算复杂性。这是通过在图中以输入为条件的适应性取样节点实现的。根据抽样节点,我们动态地预测不依赖的过滤权重和它们之间传播信息的亲近性矩阵。我们使用这一模型,显示在三种不同任务和主干结构的强度、最先进的基线方面有了显著的改进。我们的方法也超越了完全连接的图表,同时使用远小得多的浮点操作和参数。项目网站是 http://www.robots.ox.ac.uk/~lz/dgm/n/参数。