Since the classification of COVID-19 as a global pandemic, there have been many attempts to treat and contain the virus. Although there is no specific antiviral treatment recommended for COVID-19, there are several drugs that can potentially help with symptoms. In this work, we mined a large twitter dataset of 424 million tweets of COVID-19 chatter to identify discourse around drug mentions. While seemingly a straightforward task, due to the informal nature of language use in Twitter, we demonstrate the need of machine learning alongside traditional automated methods to aid in this task. By applying these complementary methods, we are able to recover almost 15% additional data, making misspelling handling a needed task as a pre-processing step when dealing with social media data.
翻译:自将COVID-19归类为全球流行病以来,人们多次尝试治疗和遏制病毒,尽管没有建议为COVID-19提供具体的抗病毒治疗,但有几种药物可能有助于治病。在这项工作中,我们埋藏了大量的Twitter数据集,有4.24亿次COVID-19聊天的推文,以识别关于毒品的言论。由于Twitter使用语言的非正式性质,我们似乎是一项直接的任务,但我们证明,在使用传统自动化方法的同时,还需要用机器学习来帮助完成这项任务。通过采用这些补充方法,我们能够回收近15%的额外数据,从而在处理社会媒体数据时,作为处理前的一个步骤,错误地处理一项必要的任务。