Lexical Semantics is concerned with how words encode mental representations of the world, i.e., concepts . We call this type of concepts, classification concepts . In this paper, we focus on Visual Semantics , namely on how humans build concepts representing what they perceive visually. We call this second type of concepts, substance concepts . As shown in the paper, these two types of concepts are different and, furthermore, the mapping between them is many-to-many. In this paper we provide a theory and an algorithm for how to build substance concepts which are in a one-to-one correspondence with classifications concepts, thus paving the way to the seamless integration between natural language descriptions and visual perception. This work builds upon three main intuitions: (i) substance concepts are modeled as visual objects , namely sequences of similar frames, as perceived in multiple encounters ; (ii) substance concepts are organized into a visual subsumption hierarchy based on the notions of Genus and Differentia ; (iii) the human feedback is exploited not to name objects, but, rather, to align the hierarchy of substance concepts with that of classification concepts. The learning algorithm is implemented for the base case of a hierarchy of depth two. The experiments, though preliminary, show that the algorithm manages to acquire the notions of Genus and Differentia with reasonable accuracy, this despite seeing a small number of examples and receiving supervision on a fraction of them.
翻译:解说性概念涉及如何用文字来表达世界的心理表现,即概念。我们称之为这种类型的概念,分类概念。在本文中,我们侧重于视觉语义,即人类如何构建代表其视觉感知的概念。我们称之为第二种概念,即实质概念。如本文所示,这两类概念是不同的,而且它们之间的映射是多方面的。在本文中,我们为如何构建与分类概念一对一对应的实质概念提供了理论和算法,从而为自然语言描述和视觉感知之间的无缝融合铺平了道路。这项工作建立在三种主要直觉上:(一) 物质概念以视觉物体为模型,即多个相遇时所认为的类似框架的序列为模型;(二) 物质概念被组织成基于Genus和Daldia概念的视觉子集; (三) 人类反馈不是用于命名对象,而是用于使接受实质概念的层次与自然语言描述和视觉感知之间无缝的融合道路。这项工作建立在三种主要直觉之上:(一) 物质概念是作为视觉对象,尽管进行了初步演算,但还是进行了不同的演算。