We demonstrate that state-of-the-art optical character recognition (OCR) based on deep learning is vulnerable to adversarial images. Minor modifications to images of printed text, which do not change the meaning of the text to a human reader, cause the OCR system to "recognize" a different text where certain words chosen by the adversary are replaced by their semantic opposites. This completely changes the meaning of the output produced by the OCR system and by the NLP applications that use OCR for preprocessing their inputs.
翻译:我们证明,基于深层学习的最新光学字符识别(OCR)很容易受到对抗图像的影响,对印刷文本图像的微小修改不会将文字的含义改变为人类阅读器,导致OCR系统“承认”不同的文字,将对手选择的某些词替换为语义对立词,这完全改变了OCR系统以及使用OCR预处理输入的NLP应用程序产生的输出的含义。