We present a Three-level Hierarchical Transformer Network (3-level-HTN) for modeling long-term dependencies across clinical notes for the purpose of patient-level prediction. The network is equipped with three levels of Transformer-based encoders to learn progressively from words to sentences, sentences to notes, and finally notes to patients. The first level from word to sentence directly applies a pre-trained BERT model as a fully trainable component. While the second and third levels both implement a stack of transformer-based encoders, before the final patient representation is fed into a classification layer for clinical predictions. Compared to conventional BERT models, our model increases the maximum input length from 512 tokens to much longer sequences that are appropriate for modeling large numbers of clinical notes. We empirically examine different hyper-parameters to identify an optimal trade-off given computational resource limits. Our experiment results on the MIMIC-III dataset for different prediction tasks demonstrate that the proposed Hierarchical Transformer Network outperforms previous state-of-the-art models, including but not limited to BigBird.
翻译:我们提出了一个三级等级等级变压器网络(3级-HTN),用于为病人一级预测的目的,在临床说明中进行长期依赖性建模。网络配备了三级基于变压器的编码器,从文字到判决、句到笔记和最后给病人作笔记。从字到句,第一级直接将经过预先训练的BERT模型作为完全可训练的组成部分。第二和第三级都安装了一堆基于变压器的编码器,然后将最后的病人代表器输入临床预测的分类层。与传统的BERT模型相比,我们的模型将最大输入长度从512个符号提高到长得多的序列,适合于大量临床说明的建模。我们用实验性方法检查不同的超参数,以确定最佳的取舍计算资源限度。我们对不同预测任务MIMIC-III数据集的实验结果显示,拟议的高压变压器网络比以往的状态模型(包括但不限于BigBirth)。