Convolutional Neural Networks (ConvNets) are trained offline using the few available data and may therefore suffer from substantial accuracy loss when ported on the field, where unseen input patterns received under unpredictable external conditions can mislead the model. Test-Time Augmentation (TTA) techniques aim to alleviate such common side effect at inference-time, first running multiple feed-forward passes on a set of altered versions of the same input sample, and then computing the main outcome through a consensus of the aggregated predictions. Unfortunately, the implementation of TTA on embedded CPUs introduces latency penalties that limit its adoption on edge applications. To tackle this issue, we propose AdapTTA, an adaptive implementation of TTA that controls the number of feed-forward passes dynamically, depending on the complexity of the input. Experimental results on state-of-the-art ConvNets for image classification deployed on a commercial ARM Cortex-A CPU demonstrate AdapTTA reaches remarkable latency savings, from 1.49X to 2.21X, and hence a higher frame rate compared to static TTA, still preserving the same accuracy gain.
翻译:电传神经网络(Convoral NealNets)是使用少量可用数据进行离线培训的,因此,在实地时可能会遭受大量准确性损失,在无法预测的外部条件下收到的无形输入模式可能会误导模型。测试时间增强技术旨在减轻这种共同的副作用,首先在一组经过修改的同一输入样本版本上运行多个反馈前传,然后通过综合预测的共识计算主要结果。不幸的是,在嵌入的CPU上实施TIT会引入延缓性处罚,限制其在边缘应用中的采用。为了解决这一问题,我们提议AapTTA, 适应性地实施TTA,根据输入的复杂程度,对进料向前传输的数量进行动态控制。在商业ARM Cortex-A PU上安装的图像分类技术状态的ConvNet实验结果显示AdapTTA在从1.49X到2.21X之间实现了显著的悬浮度节省,因此比静式TTA还要高的框架率,仍然保持同样的准确性收益。