Combination drug therapies are treatment regimens that involve two or more drugs, administered more commonly for patients with cancer, HIV, malaria, or tuberculosis. Currently there are over 350K articles in PubMed that use the "combination drug therapy" MeSH heading with at least 10K articles published per year over the past two decades. Extracting combination therapies from scientific literature inherently constitutes an $n$-ary relation extraction problem. Unlike in the general $n$-ary setting where $n$ is fixed (e.g., drug-gene-mutation relations where $n=3$), extracting combination therapies is a special setting where $n \geq 2$ is dynamic, depending on each instance. Recently, Tiktinsky et al. (NAACL 2022) introduced a first of its kind dataset, CombDrugExt, for extracting such therapies from literature. Here, we use a sequence-to-sequence style end-to-end extraction method to achieve an F1-Score of $66.7\%$ on the CombDrugExt test set for positive (or effective) combinations. This is an absolute $\approx 5\%$ F1-score improvement even over the prior best relation classification score with spotted drug entities (hence, not end-to-end). Thus our effort introduces a state-of-the-art first model for end-to-end extraction that is already superior to the best prior non end-to-end model for this task. Our model seamlessly extracts all drug entities and relations in a single pass and is highly suitable for dynamic $n$-ary extraction scenarios.
翻译:组合药物疗法是涉及两种或两种以上药物的治疗方案,常用于癌症、艾滋病、疟疾或结核病等患者的治疗。目前,PubMed 中至少有 35 万篇文章使用了“组合药物疗法” MeSH heading,过去二十年每年至少有 1 万篇文章被发表。从科学文献中提取组合疗法本质上构成了一个 $n$ 元关系抽取问题。与普通的 $n$ 元关系不同,其中 $n$ 是固定的(例如,$n=3$ 的药物-基因-突变关系),提取组合疗法是一个特殊的设置,其中 $n\ge2$ 是动态的,取决于每个实例。近期,Tiktinsky et al.(NAACL 2022)推出了第一个数据集 CombDrugExt,用于从文献中提取这种疗法。在此基础上,我们使用序列到序列风格的端到端抽取方法,在 CombDrugExt 测试集上实现了 $66.7\%$ 的 F1 值,标志着本文端到端抽取模型是首个最新的,已优于此前最佳非端到端模型的态度。我们的模型可以无缝抽取所有药物实体和关系,并非常适合动态的 $n$ 元抽取场景。