Deception detection is a task with many applications both in direct physical and in computer-mediated communication. Our focus is on automatic deception detection in text across cultures. We view culture through the prism of the individualism/collectivism dimension and we approximate culture by using country as a proxy. Having as a starting point recent conclusions drawn from the social psychology discipline, we explore if differences in the usage of specific linguistic features of deception across cultures can be confirmed and attributed to norms in respect to the individualism/collectivism divide. We also investigate if a universal feature set for cross-cultural text deception detection tasks exists. We evaluate the predictive power of different feature sets and approaches. We create culture/language-aware classifiers by experimenting with a wide range of n-gram features based on phonology, morphology and syntax, other linguistic cues like word and phoneme counts, pronouns use, etc., and token embeddings. We conducted our experiments over 11 datasets from 5 languages i.e., English, Dutch, Russian, Spanish and Romanian, from six countries (US, Belgium, India, Russia, Mexico and Romania), and we applied two classification methods i.e, logistic regression and fine-tuned BERT models. The results showed that our task is fairly complex and demanding. There are indications that some linguistic cues of deception have cultural origins, and are consistent in the context of diverse domains and dataset settings for the same language. This is more evident for the usage of pronouns and the expression of sentiment in deceptive language. The results of this work show that the automatic deception detection across cultures and languages cannot be handled in a unified manner, and that such approaches should be augmented with knowledge about cultural differences and the domains of interest.
翻译:在直接物理和计算机中介通信中,欺骗探测是一项任务,有许多应用是直接物理和计算机中介通信。我们的重点是在各种文化文本中自动检测欺骗。我们通过个人主义/集体主义层面的棱镜来看待文化,我们以国家为代名人来看待文化。我们从社会心理学学科中最近得出的结论是起点的,我们探讨在使用个人主义/集体主义差异的具体语言特征方面是否有差异,是否可归因于个人主义/集体主义差异方面的规范。我们还调查是否有一种用于跨文化文字欺骗检测任务的通用特征。我们评估不同特征组和办法的预测力。我们通过以声学、形态学和语法等广泛的正文特征实验来创建文化/语言分类系统,我们探讨在使用个人主义/集体主义差异的规范方面是否有差异。我们从5种语言(即英语、荷兰语、俄语、西班牙语和罗马尼亚语系)中进行了11种数据集的实验,我们从6个国家(美、比利时、印度、俄罗斯、墨西哥和罗马尼亚)的预测力和语言表达力学的预测力力力力力。我们用两种方法都显示,在语言和语言的回归学和变变本和变本和变本中,这种变本和变本和变本和变本中,我们的数据和变本是比较、变本和变本和变本、变本、更变后的方法显示了。我们用的方法和变本、更变本、更变本、更变本、更变本和变本、变本、更变后、更的理论、数据、更的理论、更变的理论、更变的方法显示了。我们用的方法、更变后、更变、更变后、更变、更变后、更变后、更变后、更变后、更能、更能、更能、更能、更能、更能、更能、更变、更能、更能、更变、更能、更能、更变、更变、更变、更变、更变、更变、更变、更变、更变、更变、更变、更变、更变、更变、更变、更变、更变、更变、更变、更变、更变、更能、更能、更能、更能、更变、更变、更变、更变、