We propose a simple yet effective image captioning framework that can determine the quality of an image and notify the user of the reasons for any flaws in the image. Our framework first determines the quality of images and then generates captions using only those images that are determined to be of high quality. The user is notified by the flaws feature to retake if image quality is low, and this cycle is repeated until the input image is deemed to be of high quality. As a component of the framework, we trained and evaluated a low-quality image detection model that simultaneously learns difficulty in recognizing images and individual flaws, and we demonstrated that our proposal can explain the reasons for flaws with a sufficient score. We also evaluated a dataset with low-quality images removed by our framework and found improved values for all four common metrics (e.g., BLEU-4, METEOR, ROUGE-L, CIDEr), confirming an improvement in general-purpose image captioning capability. Our framework would assist the visually impaired, who have difficulty judging image quality.
翻译:我们提出了一个简单而有效的图像说明框架,可以确定图像质量,并通知用户图像存在任何缺陷的原因。我们的框架首先确定图像质量,然后仅使用那些被确定为高质量图像生成字幕。如果图像质量低,用户将被告知其缺陷特征,如果图像质量低,将重新获取,而这一循环将重复到输入图像被认为质量高之前。作为框架的一个组成部分,我们培训和评价了一个低质量图像探测模型,该模型同时学习难以识别图像和个人缺陷,我们证明我们的提案可以充分解释缺陷的原因。我们还评估了一个由我们框架删除的低质量图像组成的数据集,并发现所有四种通用指标(如BLEU-4、METEOR、ROUGE-L、CIDER)的数值得到改善,确认通用图像说明能力的改进。我们的框架将帮助那些难以判断图像质量的视力受损者。