Despite impressive success of machine learning algorithms in clinical natural language processing (cNLP), rule-based approaches still have a prominent role. In this paper, we introduce medspaCy, an extensible, open-source cNLP library based on spaCy framework that allows flexible integration of rule-based and machine learning-based algorithms adapted to clinical text. MedspaCy includes a variety of components that meet common cNLP needs such as context analysis and mapping to standard terminologies. By utilizing spaCy's clear and easy-to-use conventions, medspaCy enables development of custom pipelines that integrate easily with other spaCy-based modules. Our toolkit includes several core components and facilitates rapid development of pipelines for clinical text.
翻译:尽管临床自然语言处理的机器学习算法(cNLP)取得了令人印象深刻的成功,但基于规则的方法仍然具有突出的作用。在本文中,我们引入了MedspaCy,这是一个基于SPaCy框架的可扩展的、开放源码的cNLP图书馆,可以灵活地整合适应临床文本的基于规则的和基于机械学习的算法。MedspaCy包括满足临床语言处理共同的CNLP需要的多种组成部分,如背景分析和对标准术语的绘图。通过利用SpaCy的清晰和易于使用的公约,MedspaCy能够开发易于与其他基于简易语言的模块融合的自订管道。我们的工具包包括几个核心组成部分,并促进临床文本管道的快速发展。