There are many ways machine learning and big data analytics are used in the fight against the COVID-19 pandemic, including predictions, risk management, diagnostics, and prevention. This study focuses on predicting COVID-19 patient shielding -- identifying and protecting patients who are clinically extremely vulnerable from coronavirus. This study focuses on techniques used for the multi-label classification of medical text. Using the information published by the United Kingdom NHS and the World Health Organisation, we present a novel approach to predicting COVID-19 patient shielding as a multi-label classification problem. We use publicly available, de-identified ICU medical text data for our experiments. The labels are derived from the published COVID-19 patient shielding data. We present an extensive comparison across 12 multi-label classifiers from the simple binary relevance to neural networks and the most recent transformers. To the best of our knowledge this is the first comprehensive study, where such a range of multi-label classifiers for medical text are considered. We highlight the benefits of various approaches, and argue that, for the task at hand, both predictive accuracy and processing time are essential.
翻译:在抗击COVID-19大流行的斗争中,有许多方法可以使用机器学习和大数据分析方法,包括预测、风险管理、诊断和预防。本研究的重点是预测COVID-19病人屏蔽 -- -- 识别和保护临床上极易受冠状病毒影响的病人。本研究的重点是医疗文本多标签分类所使用的技术。我们利用联合王国国民保健体系和世界卫生组织公布的信息,提出了一个预测COVID-19病人屏蔽作为一个多标签分类问题的新办法。我们用公开可得的、不确定的ICU医学文本数据进行实验。标签取自公布的COVID-19病人屏蔽数据。我们从与神经网络和最新变异器的简单二元关系中,对12个多标签分类器进行了广泛的比较。我们最了解的是,这是第一次综合研究,其中考虑了医学文本的多标签分类器。我们强调各种办法的好处,并争论说,对于目前的任务来说,预测准确性和处理时间都是必要的。