There has been a widely held view that visual representations (e.g., photographs and illustrations) do not depict negation, for example, one that can be expressed by a sentence "the train is not coming". This view is empirically challenged by analyzing the real-world visual representations of comic (manga) illustrations. In the experiment using image captioning tasks, we gave people comic illustrations and asked them to explain what they could read from them. The collected data showed that some comic illustrations could depict negation without any aid of sequences (multiple panels) or conventional devices (special symbols). This type of comic illustrations was subjected to further experiments, classifying images into those containing negation and those not containing negation. While this image classification was easy for humans, it was difficult for data-driven machines, i.e., deep learning models (CNN), to achieve the same high performance. Given the findings, we argue that some comic illustrations evoke background knowledge and thus can depict negation with purely visual elements.
翻译:人们广泛认为,视觉图象(例如照片和图解)并不代表否定,例如,可以用一句话“火车没有到来”来表达的图象。在分析漫画(manga)图解真实世界的图象时,这种观点受到经验上的挑战。在使用图像说明任务的实验中,我们给人们提供了漫画图解,请他们解释他们能从中读到什么。所收集的数据表明,一些漫画图解可以说明否定,而没有任何序列(多面板)或常规装置(特殊符号)的帮助。这种漫画图解受到进一步试验,将图象分类为含有否定和不含有否定的图象。虽然这种图像分类对于人类来说很容易,但数据驱动的机器,即深层学习模型(CNN)很难达到同样的高性能。根据调查结果,我们说,一些漫画图解可以激发背景知识,从而可以用纯视觉要素描述否定。