The complexity-precision trade-off of an object detector is a critical problem for resource constrained vision tasks. Previous works have emphasized detectors implemented with efficient backbones. The impact on this trade-off of proposal processing by the detection head is investigated in this work. It is hypothesized that improved detection efficiency requires a paradigm shift, towards the unequal processing of proposals, assigning more computation to good proposals than poor ones. This results in better utilization of available computational budget, enabling higher accuracy for the same FLOPS. We formulate this as a learning problem where the goal is to assign operators to proposals, in the detection head, so that the total computational cost is constrained and the precision is maximized. The key finding is that such matching can be learned as a function that maps each proposal embedding into a one-hot code over operators. While this function induces a complex dynamic network routing mechanism, it can be implemented by a simple MLP and learned end-to-end with off-the-shelf object detectors. This 'dynamic proposal processing' (DPP) is shown to outperform state-of-the-art end-to-end object detectors (DETR, Sparse R-CNN) by a clear margin for a given computational complexity.
翻译:物体探测器的复杂精度权衡是资源限制愿景任务的一个关键问题。 以前的工程强调以高效骨干实施检测器。 在这项工作中,对检测头处理建议这一权衡的影响进行了调查。 假设提高检测效率需要范式转变, 向不同处理建议的方向转变, 将更多的计算方法分配给优于差的建议书。 这导致更好地利用可用的计算预算, 使相同的 FLOPS 能够提高精确度。 我们将此设计成一个学习问题, 目标是在检测头指派操作员提出建议, 以便限制总计算成本, 使精确度最大化。 关键发现是, 这种匹配可以作为一种功能, 将每份建议嵌入一个对操作员的单热代码。 虽然该功能引出复杂的动态网络路由机制, 但可以通过简单的 MLP 来实施, 并且与现成的物体探测器一起学习端到端端端端。 这个“ 动态建议书处理” 显示, 超越了RTR- 端端端到端的精确度探测器( SPTR), 以清晰的磁度探测器为R- 。