Lack of texture often causes ambiguity in matching, and handling this issue is an important challenge in optical flow estimation tasks. Some methods insert stacked transformer modules that allow the network to use global information of cost volume for estimation. But the global information aggregation often incurs serious memory and time costs during training and inference, which hinders model deployment. We draw inspiration from the traditional local region constraint and design the local similarity aggregation (LSA) and the shifted local similarity aggregation (SLSA). The aggregation for cost volume is implemented with lightweight modules that act on the feature maps. Experiments on the final pass of Sintel show the lower cost required for our approach while maintaining competitive performance.
翻译:缺乏纹理常常导致匹配中的歧义,解决这个问题是光流估计领域的一个重要挑战。一些方法为了允许网络使用代价图的全局信息来估计光流,会插入堆叠的变换器模块。但是全局信息的聚合通常会在训练和推断期间产生严重的内存和时间成本,这限制了模型的部署。作者从传统的局部区域约束策略中汲取灵感,设计了局部相似性聚合(LSA)和平移局部相似性聚合(SLSA)。代价图的聚合是通过轻量级模块在特征图上实现的。在 Sintel 数据集的最终阶段的实验表明,我们的方法需要更低的成本,同时保持了有竞争力的性能。