Anytime inference requires a model to make a progression of predictions which might be halted at any time. Prior research on anytime visual recognition has mostly focused on image classification. We propose the first unified and end-to-end approach for anytime dense prediction. A cascade of "exits" is attached to the model to make multiple predictions. We redesign the exits to account for the depth and spatial resolution of the features for each exit. To reduce total computation, and make full use of prior predictions, we develop a novel spatially adaptive approach to avoid further computation on regions where early predictions are already sufficiently confident. Our full method, named anytime dense prediction with confidence (ADP-C), achieves the same level of final accuracy as the base model, and meanwhile significantly reduces total computation. We evaluate our method on Cityscapes semantic segmentation and MPII human pose estimation: ADP-C enables anytime inference without sacrificing accuracy while also reducing the total FLOPs of its base models by 44.4% and 59.1%. We compare with anytime inference by deep equilibrium networks and feature-based stochastic sampling, showing that ADP-C dominates both across the accuracy-computation curve. Our code is available at https://github.com/liuzhuang13/anytime .
翻译:任何时间的推论都需要一个模型来推进随时可能停止的预测。 以往的视觉识别研究主要侧重于图像分类。 我们为随时密集的预测建议了第一个统一和端到端的方法。 模型附有一系列“ 排量 ”, 以作出多重预测。 我们重新设计出口, 以测量每个出口的深度和空间分辨率; 为了减少总计算, 并充分利用先前的预测, 我们开发了一种新的空间适应性方法, 以避免在早期预测已经足够自信的区域进行进一步计算。 我们的完整方法, 命名为充满信心的随时密集预测( ADP- C), 实现与基准模型相同的最终精确度, 同时大幅降低总计算量。 我们评估了我们关于城市地貌分解和MPII 人造估计的方法: ADP- C 可以在不牺牲准确性的情况下随时进行推断, 同时将基础模型的总FLOP减少44.4% 和59.1 %。 我们用深平衡网络和基于地貌的Stochati采样取样(ADP-C) 的完整方法, 显示ADP- 13/ADimmeal crual durations