在Twitter上检测潜在有害和保护性自杀相关内容:机械学习方法 (Detecting Potentially Harmful and Protective Suicide-related Content on Twitter: A Machine Learning Approach)

Research shows that exposure to suicide-related news media content is associated with suicide rates, with some content characteristics likely having harmful and others potentially protective effects. Although good evidence exists for a few selected characteristics, systematic large scale investigations are missing in general, and in particular for social media data. We apply machine learning methods to automatically label large quantities of Twitter data. We developed a novel annotation scheme that classifies suicide-related tweets into different message types and problem- vs. solution-focused perspectives. We then trained a benchmark of machine learning models including a majority classifier, an approach based on word frequency (TF-IDF with a linear SVM) and two state-of-the-art deep learning models (BERT, XLNet). The two deep learning models achieved the best performance in two classification tasks: First, we classified six main content categories, including personal stories about either suicidal ideation and attempts or coping, calls for action intending to spread either problem awareness or prevention-related information, reportings of suicide cases, and other suicide-related and off-topic tweets. The deep learning models reach accuracy scores above 73% on average across the six categories, and F1-scores in between 69% and 85% for all but the suicidal ideation and attempts category (55%). Second, in separating postings referring to actual suicide from off-topic tweets, they correctly labelled around 88% of tweets, with BERT achieving F1-scores of 93% and 74% for the two categories. These classification performances are comparable to the state-of-the-art on similar tasks. By making data labeling more efficient, this work enables future large-scale investigations on harmful and protective effects of various kinds of social media content on suicide rates and on help-seeking behavior.

翻译：研究表明,接触与自杀有关的新闻媒体内容与自杀率有关,有些内容特征可能有害,而另一些则可能产生保护效应。虽然有好的证据存在,但有少数选定的特征,普遍缺乏系统性大规模调查,特别是社交媒体数据。我们运用机器学习方法自动标注大量推特数据。我们开发了一个新颖的批注计划,将与自杀有关的推特内容分为不同的信息类型和问题与解决办法的观点。我们随后培训了一个机器学习模式的基准,包括一个多数分类,一种基于字频(TF-IDF带有线性SVM)和两个最先进的深层次学习模式(BERT, XLNet)的做法。两种深层次的学习模式在两种分类任务中取得了最佳的绩效:第一,我们分类了六大主要内容类别,包括自杀思想和尝试的个人故事或应对,呼吁采取行动,以传播问题意识或预防相关信息,报告自杀案件,以及其它自杀相关和离题的推文推文推文。深学习模型在六大类中达到73%以上的准确分级,但在连续六大类中进行排序的自杀式推算,而F1级的推算结果中,所有38级的推算的推算数据在排序中,所有的推算的推算的推算中,在全部的推算为85次的推算的推算的推算的推算中,所有的推算的推算的推算为85的推算为85的推算的推算中,所有的推算的推算的推算的推算的推算中,所有。

相关内容

Machine Learning

关注 2242

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【重磅】2022年IEEE Fellow出炉！ 310位新晋升会士！王海峰、田永鸿、汪玉、申恒涛等七十九位华人当选！

专知会员服务

7+阅读 · 2021年11月24日

【经典书】使用机器学习R语言，149页pdf，Practical Machine Learning in R

专知会员服务

24+阅读 · 2021年1月13日