Online forums that allow for participatory engagement between users have been transformative for the public discussion of many important issues. However, such conversations can sometimes escalate into full-blown exchanges of hate and misinformation. Existing approaches in natural language processing (NLP), such as deep learning models for classification tasks, use as inputs only a single comment or a pair of comments depending upon whether the task concerns the inference of properties of the individual comments or the replies between pairs of comments, respectively. But in online conversations, comments and replies may be based on external context beyond the immediately relevant information that is input to the model. Therefore, being aware of the conversations' surrounding contexts should improve the model's performance for the inference task at hand. We propose GraphNLI, a novel graph-based deep learning architecture that uses graph walks to incorporate the wider context of a conversation in a principled manner. Specifically, a graph walk starts from a given comment and samples "nearby" comments in the same or parallel conversation threads, which results in additional embeddings that are aggregated together with the initial comment's embedding. We then use these enriched embeddings for downstream NLP prediction tasks that are important for online conversations. We evaluate GraphNLI on two such tasks - polarity prediction and misogynistic hate speech detection - and found that our model consistently outperforms all relevant baselines for both tasks. Specifically, GraphNLI with a biased root-seeking random walk performs with a macro-F1 score of 3 and 6 percentage points better than the best-performing BERT-based baselines for the polarity prediction and hate speech detection tasks, respectively.
翻译:允许用户参与的在线论坛对公众讨论许多重要问题起到了变革作用。然而,这种对话有时会升级成仇恨和错误信息的全面交流。自然语言处理(NLP)的现有方法,如分类任务的深度学习模式(NLP),仅作为输入单项评论或一对评论,取决于任务是否涉及个别评论或对评论之间答复的属性的推断。但在在线对话、评论和答复中,除了输入模型的直接相关信息外,还可能基于外部背景。因此,了解这些对话的周围背景会提高当前推断任务的模型的性能。我们建议采用基于图表的新的深层次学习结构,使用图表的行走,以原则方式纳入更广泛的对话背景。具体地说,一个图表行走始于同一或平行对话线上的“早期”评论和抽样评论,从而导致与初步评论一起汇总的更多内容。我们随后用这些更丰富的嵌入式NPRLLLLLI, 用于进行下游的准确度预测任务,一个更精确的基线任务,一个更精确的准确的排序任务,一个相关的直径的基线任务,一个比直径的直线分析任务,一个相关的直径分析任务,一个相关的直径的直径测,一个比直地分析任务,一个重要。