Unmanned Aerial Vehicles (UAVs) based video text spotting has been extensively used in civil and military domains. UAV's limited battery capacity motivates us to develop an energy-efficient video text spotting solution. In this paper, we first revisit RCNN's crop & resize training strategy and empirically find that it outperforms aligned RoI sampling on a real-world video text dataset captured by UAV. To reduce energy consumption, we further propose a multi-stage image processor that takes videos' redundancy, continuity, and mixed degradation into account. Lastly, the model is pruned and quantized before deployed on Raspberry Pi. Our proposed energy-efficient video text spotting solution, dubbed as E^2VTS, outperforms all previous methods by achieving a competitive tradeoff between energy efficiency and performance. All our codes and pre-trained models are available at https://github.com/wuzhenyusjtu/LPCVC20-VideoTextSpotting.
翻译:无人驾驶航空飞行器(UAVs)的视频文本定位已被广泛用于民用和军事领域。无人驾驶航空飞行器(UAVs)的有限电池容量激励我们开发一个节能视频文本检测解决方案。在本文中,我们首先重新审视RCNN的作物和调整规模培训战略,并从经验上发现它比UAV所捕捉的真实世界视频文本数据集的匹配性RoI取样效果要强。为了减少能源消耗,我们进一步提议建立一个多阶段图像处理器,将视频的冗余、连续性和混合降解考虑在内。最后,该模型在安装在Raspberry Pi之前就已经进行了剪裁和量化。我们提议的节能视频文本检测解决方案被称为E&2VTS,通过实现能源效率和性能之间的竞争性权衡,超越了以往所有方法。我们的所有代码和事先培训模型都可在https://github.com/wuzhenyusjtu/LPCVC20-VideoTextSpoting上查阅。