Data quality is critical for multimedia tasks, while various types of systematic flaws are found in image benchmark datasets, as discussed in recent work. In particular, the existence of the semantic gap problem leads to a many-to-many mapping between the information extracted from an image and its linguistic description. This unavoidable bias further leads to poor performance on current computer vision tasks. To address this issue, we introduce a Knowledge Representation (KR)-based methodology to provide guidelines driving the labeling process, thereby indirectly introducing intended semantics in ML models. Specifically, an iterative refinement-based annotation method is proposed to optimize data labeling by organizing objects in a classification hierarchy according to their visual properties, ensuring that they are aligned with their linguistic descriptions. Preliminary results verify the effectiveness of the proposed method.
翻译:数据质量对于多媒体任务至关重要,而最近的研究讨论了图像基准数据集中发现的各种系统性缺陷。特别是,语义差距问题的存在导致从图像中提取的信息与其语言描述之间存在多对多的映射。这种不可避免的偏差进一步导致当前计算机视觉任务的性能不佳。为了解决这个问题,我们引入了一种基于知识表示(KR)的方法来提供指导标注过程的准则,从而间接地将预期语义引入ML模型中。具体而言,提出了一种迭代改进的注释方法来优化数据标注,通过根据他们的视觉属性将对象组织成一个分类层次结构,确保它们与它们的语言描述相一致。初步结果验证了所提出方法的有效性。