Recent years have seen particular interest in using electronic medical records (EMRs) for secondary purposes to enhance the quality and safety of healthcare delivery. EMRs tend to contain large amounts of valuable clinical notes. Learning of embedding is a method for converting notes into a format that makes them comparable. Transformer-based representation models have recently made a great leap forward. These models are pre-trained on large online datasets to understand natural language texts effectively. The quality of a learning embedding is influenced by how clinical notes are used as input to representation models. A clinical note has several sections with different levels of information value. It is also common for healthcare providers to use different expressions for the same concept. Existing methods use clinical notes directly or with an initial preprocessing as input to representation models. However, to learn a good embedding, we identified the most essential clinical notes section. We then mapped the extracted concepts from selected sections to the standard names in the Unified Medical Language System (UMLS). We used the standard phrases corresponding to the unique concepts as input for clinical models. We performed experiments to measure the usefulness of the learned embedding vectors in the task of hospital mortality prediction on a subset of the publicly available Medical Information Mart for Intensive Care (MIMIC-III) dataset. According to the experiments, clinical transformer-based representation models produced better results with getting input generated by standard names of extracted unique concepts compared to other input formats. The best-performing models were BioBERT, PubMedBERT, and UmlsBERT, respectively.
翻译:近些年来,人们特别有兴趣将电子医疗记录(EMR)用于辅助目的,以提高保健服务的质量和安全性;EMR往往包含大量有价值的临床说明;学习嵌入是将笔记转换成使其具有可比性的格式的一种方法;以变异器为基础的代表模型最近取得了很大的飞跃;这些模型在大型在线数据集上经过预先培训,以有效理解自然语言文本;学习嵌入的质量受到如何将临床笔记用作代表模型投入的影响;临床说明有若干部分,信息价值不同;保健提供者也经常使用不同表达方式来使用同一概念;现有方法直接使用临床笔记,或以初步预处理方式将笔记转换成一种能够使其具有可比性的格式;然而,为了学习良好的嵌入,我们确定了最基本的临床说明部分。我们随后将选定部分的概念与统一医疗语言系统的标准名称进行了绘图。我们使用与独特概念相对应的标准词作为临床模型的投入。我们进行了实验,以测量所学过的嵌入矢量在医院死亡率模型的UMLI-B中是否有用不同的说法;现有方法直接使用临床说明或初步预处理作为代表模型的投入。