Suicide is a major public health crisis. With more than 20,000,000 suicide attempts each year, the early detection of suicidal intent has the potential to save hundreds of thousands of lives. Traditional mental health screening methods are time-consuming, costly, and often inaccessible to disadvantaged populations; online detection of suicidal intent using machine learning offers a viable alternative. Here we present Robin, the largest non-keyword generated suicidal corpus to date, consisting of over 1.1 million online forum postings. In addition to its unprecedented size, Robin is specially constructed to include various categories of suicidal text, such as suicide bereavement and flippant references, better enabling models trained on Robin to learn the subtle nuances of text expressing suicidal ideation. Experimental results achieve state-of-the-art performance for the classification of suicidal text, both with traditional methods like logistic regression (F1=0.85), as well as with large-scale pre-trained language models like BERT (F1=0.92). Finally, we release the Robin dataset publicly as a machine learning resource with the potential to drive the next generation of suicidal sentiment research.
翻译:自杀是一个重大的公共卫生危机。 早期发现自杀意图每年有超过2 000 000次自杀企图,有可能挽救数十万人的生命。传统的心理健康筛查方法耗时费钱,而且对弱势人口来说往往无法使用;使用机器学习在线检测自杀意图是一个可行的选择。这里我们介绍迄今为止最大的非关键词Robin,这是迄今产生自杀人身保护的最大的非关键词,由110万多个在线论坛发布。除了前所未有的规模外,Robin是专门为包括各类自杀文本而建造的,如自杀式抢救和翻版引用,对Robin进行更好的培训的模型,以了解表达自杀性想法的文字的微妙微妙微妙微妙之处。实验结果在自杀性文字分类方面达到最先进的表现,既有传统的方法,如后勤回归(F1=0.85),还有大规模预先训练的语言模型,如BERT(F1=0.92)。最后,我们公开推出Robin数据集,作为机器学习资源,有可能推动下一代自杀性情绪研究。