We propose a knowledge-based approach for extraction of Cause-Effect (CE) relations from biomedical text. Our approach is a combination of an unsupervised machine learning technique to discover causal triggers and a set of high-precision linguistic rules to identify cause/effect arguments of these causal triggers. We evaluate our approach using a corpus of 58,761 Leukaemia-related PubMed abstracts consisting of 568,528 sentences. We could extract 152,655 CE triplets from this corpus where each triplet consists of a cause phrase, an effect phrase and a causal trigger. As compared to the existing knowledge base - SemMedDB (Kilicoglu et al., 2012), the number of extractions are almost twice. Moreover, the proposed approach outperformed the existing technique SemRep (Rindflesch and Fiszman, 2003) on a dataset of 500 sentences.
翻译:我们提出从生物医学文本中提取因果关系(CE)关系的基于知识的方法,我们的方法是结合一种未经监督的机器学习技术,以发现因果关系触发因素,以及一套高精准语言规则,以确定这些因果关系的因果关系理由/效果;我们使用58,761份与白血病有关的PubMed摘要(由568,528项判决组成)来评估我们的方法;我们可以从这一体中提取152,655个CE三重,其中每个三重由原因短语、效果短语和因果关系组成;与现有的知识基础-SemMedDB(Kilioglu等人,2012年)相比,提取数量几乎是两倍;此外,拟议方法在500句话的数据集上比现有的SemRep技术(Rindflesch和Fiszman,2003年)高出了近一倍。