We convert the Chinese medical text attributes extraction task into a sequence tagging or machine reading comprehension task. Based on BERT pre-trained models, we have not only tried the widely used LSTM-CRF sequence tagging model, but also other sequence models, such as CNN, UCNN, WaveNet, SelfAttention, etc, which reaches similar performance as LSTM+CRF. This sheds a light on the traditional sequence tagging models. Since the aspect of emphasis for different sequence tagging models varies substantially, ensembling these models adds diversity to the final system. By doing so, our system achieves good performance on the task of Chinese medical text attributes extraction (subtask 2 of CCKS 2019 task 1).
翻译:我们把中国医学文本属性提取任务转换成一个序列标记或机器阅读理解任务。根据BERT预先培训的模型,我们不仅尝试了广泛使用的LSTM-CRF序列标记模型,而且尝试了其他序列模型,如CNN、UCNN、WaveNet、自控等,这些模型的性能与LSTM+CRF相似。这为传统序列标记模型提供了线索。由于不同序列标记模型的重点差异很大,这些模型的组合为最终系统增加了多样性。通过这样做,我们的系统在中国医学文本属性提取任务(CCKS 2019任务1的子任务2)上取得了良好的业绩。