In this paper, we study the impact of motion blur, a common quality flaw in real world images, on a state-of-the-art two-stage image captioning solution, and notice a degradation in solution performance as blur intensity increases. We investigate techniques to improve the robustness of the solution to motion blur using training data augmentation at each or both stages of the solution, i.e., object detection and captioning, and observe improved results. In particular, augmenting both the stages reduces the CIDEr-D degradation for high motion blur intensity from 68.7 to 11.7 on MS COCO dataset, and from 22.4 to 6.8 on Vizwiz dataset.
翻译:在本文中,我们研究了运动模糊性的影响,即真实世界图像中常见的质量缺陷,对最先进的两阶段图像说明解决方案的影响,并且注意到随着模糊强度的增加,溶液性能的退化。我们调查了在解决方案的每个阶段或两个阶段,即物体探测和说明阶段使用培训数据放大性能来提高溶液模糊性能的方法,并观察了改进的结果。特别是,这两个阶段的扩大使得高运动模糊性能的CIDER-D退化从MS COCO数据集的68.7降至11.7,在Vizwiz数据集的22.4降至6.8。