Semantic dimensions of sound have been playing a central role in understanding the nature of auditory sensory experience as well as the broader relation between perception, language, and meaning. Accordingly, and given the recent proliferation of large language models (LLMs), here we asked whether such models exhibit an organisation of perceptual semantics similar to those observed in humans. Specifically, we prompted ChatGPT, a chatbot based on a state-of-the-art LLM, to rate musical instrument sounds on a set of 20 semantic scales. We elicited multiple responses in separate chats, analogous to having multiple human raters. ChatGPT generated semantic profiles that only partially correlated with human ratings, yet showed robust agreement along well-known psychophysical dimensions of musical sounds such as brightness (bright-dark) and pitch height (deep-high). Exploratory factor analysis suggested the same dimensionality but different spatial configuration of a latent factor space between the chatbot and human ratings. Unexpectedly, the chatbot showed degrees of internal variability that were comparable in magnitude to that of human ratings. Our work highlights the potential of LLMs to capture salient dimensions of human sensory experience.
翻译:声音的语义维度在理解听觉感受以及知觉、语言和含义之间的广泛关系方面发挥着核心作用。因此,鉴于大型语言模型(LLM)最近的流行,本研究探索这种模型是否展示出与人类所观察到的感知语义相似的组织方式。具体而言,我们启动了一个基于最先进的LLM的聊天机器人ChatGPT,要求它在一组20个语义尺度上对乐器声进行评分。我们在不同的聊天中 elicited 多个响应,类似于有多个人类评定者。ChatGPT产生了只部分与人工评分相关的语义轮廓,但表现出了在音乐声音的明暗和音高等众所周知的心理物理维度上的稳定一致性。探索性因素分析 suggested 出ChatGPT和人类评分之间的潜在因子空间具有相同的维度,但是不同的空间配置。意外的是,聊天机器人表现出的内部变异程度与人类评分相当。我们的工作凸显了LLM捕捉人类感知体验显著维度的潜力。