This study constructs machine learning algorithms that are trained to classify samples using sound symbolism, and then it reports on an experiment designed to measure their understanding against human participants. Random forests are trained using the names of Pokemon, which are fictional video game characters, and their evolutionary status. Pokemon undergo evolution when certain in-game conditions are met. Evolution changes the appearance, abilities, and names of Pokemon. In the first experiment, we train three random forests using the sounds that make up the names of Japanese, Chinese, and Korean Pokemon to classify Pokemon into pre-evolution and post-evolution categories. We then train a fourth random forest using the results of an elicitation experiment whereby Japanese participants named previously unseen Pokemon. In Experiment 2, we reproduce those random forests with name length as a feature and compare the performance of the random forests against humans in a classification experiment whereby Japanese participants classified the names elicited in Experiment 1 into pre-and post-evolution categories. Experiment 2 reveals an issue pertaining to overfitting in Experiment 1 which we resolve using a novel cross-validation method. The results show that the random forests are efficient learners of systematic sound-meaning correspondence patterns and can classify samples with greater accuracy than the human participants.
翻译:本研究构建了机器学习算法,这些算法经过培训,对样本使用健全的符号性进行分类,然后报告一项旨在衡量对人类参与者的理解程度的实验。随机森林是使用Pokemon名字来培训的,这些名字是虚构的游戏游戏字符及其进化状态。当某些游戏条件得到满足时,Pokomon会进化。进化改变了波科蒙的外观、能力和名称。在第一个实验中,我们用日本、中国和韩国波凯蒙的名字来培训三个随机森林,将波科蒙分为进前和进进后类别。然后,我们用日本参与者以以前看不见的波科蒙命名的引言实验结果来培训第四个随机森林。在实验中,我们复制了那些有名称长的随机森林,并将随机森林的性能与人类进行对比。在分类实验中,日本参与者将实验中得出的名字分类为进化前和进化后类别。实验2揭示了一个在实验中过度适应的问题,我们用新的交叉估价方法来解决。结果显示,随机森林的精确性比系统声音模式的参与者更精确性。