Here we proposed an approach to analyze text classification methods based on the presence or absence of task-specific terms (and their synonyms) in the text. We applied this approach to study six different transfer-learning and unsupervised methods for screening articles relevant to COVID-19 vaccines and therapeutics. The analysis revealed that while a BERT model trained on search-engine results generally performed well, it miss-classified relevant abstracts that did not contain task-specific terms. We used this insight to create a more effective unsupervised ensemble.
翻译:在此,我们建议了一种基于案文中存在或不存在特定任务术语(及其同义词)来分析文本分类方法的方法。我们采用了这种方法来研究与COVID-19疫苗和治疗方法相关的六种不同的转移学习和未经监督的筛选文章的方法。分析表明,虽然经过搜索引擎结果培训的BERT模型一般表现良好,但其中没有包含特定任务术语的分类错误相关摘要。我们利用这一洞察力创建了一个更有效的、不受监督的组合。