We present a method for extracting a multilingual sentiment annotated dialog data set from Fallout New Vegas. The game developers have preannotated every line of dialog in the game in one of the 8 different sentiments: \textit{anger, disgust, fear, happy, neutral, pained, sad } and \textit{surprised}. The game has been translated into English, Spanish, German, French and Italian. We conduct experiments on multilingual, multilabel sentiment analysis on the extracted data set using multilingual BERT, XLMRoBERTa and language specific BERT models. In our experiments, multilingual BERT outperformed XLMRoBERTa for most of the languages, also language specific models were slightly better than multilingual BERT for most of the languages. The best overall accuracy was 54\% and it was achieved by using multilingual BERT on Spanish data. The extracted data set presents a challenging task for sentiment analysis. We have released the data, including the testing and training splits, openly on Zenodo. The data set has been shuffled for copyright reasons.
翻译:我们展示了一种方法,从新拉斯维加斯瀑布中提取多语种的附加说明的对话框数据集。游戏开发者以8种不同情感之一预注了游戏中的每条对话线:\ textit{anger、厌恶、恐惧、快乐、中立、痛苦、悲伤}和\ textit{surpidid}。游戏已被翻译成英文、西班牙文、德文、法文和意大利文。我们用多语的BERT、XLMMOBERTA和语言特定的BERTA模型对抽取的数据集进行了多语种、多标签的情绪分析实验。在我们实验中,多语种的BERT超过大多数语言的 XLMORBERTA,语言的具体模式也比多语种的BERT略好一些。最佳的总体精确度是54 ⁇,这是通过在西班牙语数据上使用多语种的BERT实现的。提取数据集对情绪分析是一项艰巨的任务。我们已经在Zenodo公开公布了数据,包括测试和培训分解。由于版权原因,数据集被冲洗。