Clinical notes containing valuable patient information are written by different health care providers with various scientific levels and writing styles. It might be helpful for clinicians and researchers to understand what information is essential when dealing with extensive electronic medical records. Entities recognizing and mapping them to standard terminologies is crucial in reducing ambiguity in processing clinical notes. Although named entity recognition and entity linking are critical steps in clinical natural language processing, they can also result in the production of repetitive and low-value concepts. In other hand, all parts of a clinical text do not share the same importance or content in predicting the patient's condition. As a result, it is necessary to identify the section in which each content is recorded and also to identify key concepts to extract meaning from clinical texts. In this study, these challenges have been addressed by using clinical natural language processing techniques. In addition, in order to identify key concepts, a set of popular unsupervised key phrase extraction methods has been verified and evaluated. Considering that most of the clinical concepts are in the form of multi-word expressions and their accurate identification requires the user to specify n-gram range, we have proposed a shortcut method to preserve the structure of the expression based on TF-IDF. In order to evaluate the pre-processing method and select the concepts, we have designed two types of downstream tasks (multiple and binary classification) using the capabilities of transformer-based models. The obtained results show the superiority of proposed method in combination with SciBERT model, also offer an insight into the efficacy of general extracting essential phrase methods for clinical notes.
翻译:包含宝贵病人信息的临床说明由具有不同科学水平和写作风格的不同保健提供者编写,临床医生和研究人员在处理广泛的电子医疗记录时了解哪些信息至关重要,也许有帮助;实体确认信息并将其绘制成标准术语对于减少临床说明处理过程中的模糊性至关重要;虽然名称实体识别和实体链接是临床自然语言处理过程中的关键步骤,但也可以产生重复和低价值的概念;另一方面,临床文本的所有部分在预测病人状况方面并不具有同等重要性或内容。因此,有必要确定记录每个内容的章节,并查明从临床记录中提取含义的关键概念。在本研究中,通过使用临床自然语言处理技术来应对这些挑战。此外,为了确定关键概念,一套流行的不受监督的关键短语提取方法也得到了核实和评价。考虑到临床模型的大多数部分是以多种语言表达形式为基础,其准确识别要求用户指定正方位范围。因此,我们提议了一种快捷方法,以保存临床文本中精度的精度结构。</s>