In the expansion of biomedical dataset, the same category may be labeled with different terms, thus being tedious and onerous to curate these terms. Therefore, automatically mapping synonymous terms onto the ontologies is desirable, which we name as biomedical synonym prediction task. Unlike biomedical concept normalization (BCN), no clues from context can be used to enhance synonym prediction, making it essential to extract graph features from ontology. We introduce an expert-curated dataset OBO-syn encompassing 70 different types of concepts and 2 million curated concept-term pairs for evaluating synonym prediction methods. We find BCN methods perform weakly on this task for not making full use of graph information. Therefore, we propose GraphPrompt, a prompt-based learning approach that creates prompt templates according to the graphs. GraphPrompt obtained 37.2\% and 28.5\% improvement on zero-shot and few-shot settings respectively, indicating the effectiveness of these graph-based prompt templates. We envision that our method GraphPrompt and OBO-syn dataset can be broadly applied to graph-based NLP tasks, and serve as the basis for analyzing diverse and accumulating biomedical data. All the data and codes are avalible at: https://github.com/HanwenXuTHU/GraphPrompt
翻译:暂无翻译