This paper summarizes the CLaC submission for SMM4H 2022 Task 10 which concerns the recognition of diseases mentioned in Spanish tweets. Before classifying each token, we encode each token with a transformer encoder using features from Multilingual RoBERTa Large, UMLS gazetteer, and DISTEMIST gazetteer, among others. We obtain a strict F1 score of 0.869, with competition mean of 0.675, standard deviation of 0.245, and median of 0.761.
翻译:本文总结了CLaC为SMM4H 2022任务10提交的呈件,该任务10涉及承认西班牙推文中提到的疾病。在对每件标语进行分类之前,我们使用多语种Robreta large、UMLS 地名录和DISTEMIST 地名录等的功能,用变压器编码每件标语。 我们获得了严格的F1分0.869,竞争平均值为0.675,标准偏差为0.245,中位值为0.761。