Medical articles provide current state of the art treatments and diagnostics to many medical practitioners and professionals. Existing public databases such as MEDLINE contain over 27 million articles, making it difficult to extract relevant content without the use of efficient search engines. Information retrieval tools are crucial in order to navigate and provide meaningful recommendations for articles and treatments. Classifying these articles into broader medical topics can improve the retrieval of related articles. The set of medical labels considered for the MESINESP task is on the order of several thousands of labels (DeCS codes), which falls under the extreme multi-label classification problem. The heterogeneous and highly hierarchical structure of medical topics makes the task of manually classifying articles extremely laborious and costly. It is, therefore, crucial to automate the process of classification. Typical machine learning algorithms become computationally demanding with such a large number of labels and achieving better recall on such datasets becomes an unsolved problem. This work presents Priberam's participation at the BioASQ task Mesinesp. We address the large multi-label classification problem through the use of four different models: a Support Vector Machine (SVM), a customised search engine (Priberam Search), a BERT based classifier, and a SVM-rank ensemble of all the previous models. Results demonstrate that all three individual models perform well and the best performance is achieved by their ensemble, granting Priberam the 6th place in the present challenge and making it the 2nd best team.
翻译:现有公共数据库,如MEDLINE 包含超过2 700万条文章,因此很难在不使用高效搜索引擎的情况下提取相关内容,因此很难在不使用高效搜索引擎的情况下提取相关内容。信息检索工具对于浏览和提供有意义的文章和治疗建议至关重要。将这些条款分类为更广泛的医学专题可以改进相关文章的检索。MESINESP 任务考虑的一套医学标签是数千个标签(DeCSNE)的顺序,属于极端多标签分类问题。医疗主题的多样化和高度等级结构使得手工分类物品的任务非常艰巨和昂贵。因此,对分类过程自动化至关重要。典型机器学习算法在计算上要求如此多的标签,并更好地回顾这类数据集,这已成为一个尚未解决的问题。这项工作表明Priberam参加BioASQ任务(Desclement Mesinenep)的顺序。我们通过使用四种不同的模型解决了巨大的多标签分类问题:支持VCtorM(SVM),使分类过程过程过程变得非常辛苦和昂贵。典型的机器(SISM)是所有高级搜索模型。