Deep learning based 3D stereo networks give superior performance compared to 2D networks and conventional stereo methods. However, this improvement in the performance comes at the cost of increased computational complexity, thus making these networks non-practical for the real-world applications. Specifically, these networks use 3D convolutions as a major work horse to refine and regress disparities. In this work first, we show that these 3D convolutions in stereo networks consume up to 94% of overall network operations and act as a major bottleneck. Next, we propose a set of "plug-&-run" separable convolutions to reduce the number of parameters and operations. When integrated with the existing state of the art stereo networks, these convolutions lead up to 7x reduction in number of operations and up to 3.5x reduction in parameters without compromising their performance. In fact these convolutions lead to improvement in their performance in the majority of cases.
翻译:与2D网络和传统立体法相比,深层次的3D立体立体声网络表现优于2D网络和传统立体声方法。然而,这一性能的改善是以计算复杂性增加为代价的,从而使这些网络对现实世界应用程序不具有实用性。具体地说,这些网络利用3D演进作为主要工作马来改进和倒退差异。在这项工作中,我们首先表明,立体网络的3D演进消耗了整个网络业务的94%,并起到主要瓶颈的作用。接下来,我们提出一套“插管和运行”可分离的组合,以减少参数和操作的数量。当与艺术立体网络的现有状态相结合时,这些演进导致操作数量减少7x倍,在不损害其性能的情况下使参数减少3.5x倍。事实上,这些演进在多数情况下导致其绩效的改善。