An emerging trend on social media platforms is their use as safe spaces for peer support. Particularly in healthcare, where many medical conditions contain harsh stigmas, social media has become a stigma-free way to engage in dialogues regarding symptoms, treatments, and personal experiences. Many existing works have employed NLP algorithms to facilitate quantitative analysis of health trends. Notably absent from existing works are keyphrase extraction (KE) models for social health posts-a task crucial to discovering emerging public health trends. This paper presents a novel, theme-driven KE dataset, SuboxoPhrase, and a qualitative annotation scheme with an overarching goal of extracting targeted clinically-relevant keyphrases. To the best of our knowledge, this is the first study to design a KE schema for social media healthcare texts. To demonstrate the value of this approach, this study analyzes Reddit posts regarding medications for opioid use disorder, a paramount health concern worldwide. Additionally, we benchmark ten off-the-shelf KE models on our new dataset, demonstrating the unique extraction challenges in modeling user-generated health texts. The proposed theme-driven KE approach lays the foundation of future work on efficient, large-scale analysis of social health texts, allowing researchers to surface useful public health trends, patterns, and knowledge gaps.
翻译:社会媒体平台的新趋势是将社会媒体平台用作同伴支持的安全空间。 特别是在医疗保健领域,许多医疗条件含有严厉的污名,社交媒体已成为参与有关症状、治疗和个人经验的对话的无污名方式。许多现有作品都采用了NLP算法,以便利对健康趋势进行定量分析。值得注意的是,现有作品中没有社会卫生日志的关键词提取(KE)模型,这是发现新出现的公共卫生趋势的关键任务。本文将10个现成的KE模型以我们新的数据集为基准,展示了用户生成的卫生文本模型中独特的提取挑战。根据我们的知识,这是为社会媒体保健文本设计KE Schema的第一个研究。为了展示这一方法的价值,本研究分析了有关类阿片使用紊乱症药物的重新应用站点,这是全世界最大的卫生问题。此外,我们把10个现成的KE模型作为基准,展示了用户生成的卫生文本模型中独特的提取挑战。拟议的主题驱动的KE方法,为大规模卫生研究者的未来健康趋势提供了高效的地面分析基础。