This paper argues in favor of the adoption of annotation practices for multimodal datasets that recognize and represent the inherently perspectivized nature of multimodal communication. To support our claim, we present a set of annotation experiments in which FrameNet annotation is applied to the Multi30k and the Flickr 30k Entities datasets. We assess the cosine similarity between the semantic representations derived from the annotation of both pictures and captions for frames. Our findings indicate that: (i) frame semantic similarity between captions of the same picture produced in different languages is sensitive to whether the caption is a translation of another caption or not, and (ii) picture annotation for semantic frames is sensitive to whether the image is annotated in presence of a caption or not.
翻译:本文主张对多式联运数据集采用说明做法,承认并代表多式联运固有的分化性质。为了支持我们的主张,我们提出一套说明实验,将框架网的注解应用到Multi30k和Flickr 30k实体的数据集中。我们评估了图文和图文的注解所产生的语义表达形式之间的内在相似性。我们的调查结果表明:(一) 以不同语言制作的同一图文的文字在语义上的相似性,对标题是否译为另一标题十分敏感,(二) 语义框架的图解对图文是否在字幕面前附加说明十分敏感。