Controversy is widespread online. Previous studies mainly define controversy based on vague assumptions of its relation to sentiment such as hate speech and offensive words. This paper introduces the first question-answering dataset that defines content controversy by user perception, i.e., votes from plenty of users. It contains nearly 10K questions, and each question has a best answer and a most controversial answer. Experimental results reveal that controversy detection in question answering is essential and challenging, and there is no strong correlation between controversy and sentiment tasks.
翻译:争议在网上广泛存在。 先前的研究主要基于其与仇恨言论和冒犯性言论等情绪关系的模糊假设来定义争议。 本文介绍了第一个问答数据集,该数据集通过用户的认知来定义内容争议,即来自众多用户的选票。 它包含近10K个问题,每个问题都有最佳答案和争议性最大的答案。 实验结果显示,发现争议回答至关重要且具有挑战性,争议与情绪任务之间没有密切的关联。