We discuss two kinds of semantics relevant to Computer Vision (CV) systems - Visual Semantics and Lexical Semantics. While visual semantics focus on how humans build concepts when using vision to perceive a target reality, lexical semantics focus on how humans build concepts of the same target reality through the use of language. The lack of coincidence between visual and lexical semantics, in turn, has a major impact on CV systems in the form of the Semantic Gap Problem (SGP). The paper, while extensively exemplifying the lack of coincidence as above, introduces a general, domain-agnostic methodology to enforce alignment between visual and lexical semantics.
翻译:我们讨论了与计算机视觉(CV)系统有关的两种语义——视觉语义学和词汇学语义学。视觉语义学侧重于人类在使用视觉来感知目标现实时如何构建概念,而词汇学则侧重于人类如何通过使用语言构建同一目标现实的概念。视觉语义学和词汇学语义学之间缺乏巧合,反过来又以语义差距问题的形式(SGP)对CV系统产生重大影响。 本文在广泛举例说明缺乏巧合的同时,引入了一种一般的、域名化的方法,以强制在视觉语义和词汇学语义之间保持一致。