The combined growth of available data and their unstructured nature has received increased interest in natural language processing (NLP) techniques to make value of these data assets since this format is not suitable for statistical analysis. This work presents a systematic literature review of state-of-the-art advances using transformer-based methods on electronic medical records (EMRs) in different NLP tasks. To the best of our knowledge, this work is unique in providing a comprehensive review of research on transformer-based methods for NLP applied to the EMR field. In the initial query, 99 articles were selected from three public databases and filtered into 65 articles for detailed analysis. The papers were analyzed with respect to the business problem, NLP task, models and techniques, availability of datasets, reproducibility of modeling, language, and exchange format. The paper presents some limitations of current research and some recommendations for further research.
翻译:随着可用数据的不断增长和非结构化本质的数据,自然语言处理(NLP)技术的使用已引起极大兴趣,以利用这些数据资产。这个格式不适合统计分析。本文提供了系统文献综述的最新进展,介绍了在不同NLP任务中使用基于transformer方法的电子病历(EMR)的研究。据我们所知,本文是提供在EMR领域应用NLP的transformer方法的综合研究。在最初的查询中,从三个公共数据库中选择了99篇文章,并过滤为65篇文章进行详细分析。文章根据业务问题、NLP任务、模型和技术、数据集的可用性、建模的可重复性、语言和交换格式进行分析。该论文提出了当前研究的一些限制和一些进一步研究的建议。