Pretrained language models have been shown to encode relational information, such as the relations between entities or concepts in knowledge-bases -- (Paris, Capital, France). However, simple relations of this type can often be recovered heuristically and the extent to which models implicitly reflect topological structure that is grounded in world, such as perceptual structure, is unknown. To explore this question, we conduct a thorough case study on color. Namely, we employ a dataset of monolexemic color terms and color chips represented in CIELAB, a color space with a perceptually meaningful distance metric. Using two methods of evaluating the structural alignment of colors in this space with text-derived color term representations, we find significant correspondence. Analyzing the differences in alignment across the color spectrum, we find that warmer colors are, on average, better aligned to the perceptual color space than cooler ones, suggesting an intriguing connection to findings from recent work on efficient communication in color naming. Further analysis suggests that differences in alignment are, in part, mediated by collocationality and differences in syntactic usage, posing questions as to the relationship between color perception and usage and context.
翻译:受过训练的语言模型已被显示为对关系信息的编码,例如知识库(巴黎、首都、法国)中实体或概念之间的关系或概念之间的关系(巴黎、首都、法国)。然而,这种简单的关系往往可以超常地恢复,而模型隐含地反映基于世界的地形结构的程度(例如概念结构)还不清楚。为了探讨这一问题,我们对颜色进行彻底的案例研究。也就是说,我们使用在CIELAB中呈现的单体色彩术语和颜色芯片数据集,这是一个有感知意义的距离度量的颜色空间。我们使用两种方法来评价这一空间的颜色与文本衍生的颜色术语表征之间的结构一致,我们发现相当的对应性。分析不同彩色谱之间对调的差异,我们发现平均而言,更暖的颜色比冷色的颜色空间更符合感性色彩空间,表明与最近关于有效调色命名的交流工作结果有内在的关联性。进一步的分析表明,在调整方面的差异中,部分是通过对调和同步使用的差异进行介质和背景使用之间的关系。