In this paper, we introduce our approaches using Transformer-based models for different problems of the COLIEE 2021 automatic legal text processing competition. Automated processing of legal documents is a challenging task because of the characteristics of legal documents as well as the limitation of the amount of data. With our detailed experiments, we found that Transformer-based pretrained language models can perform well with automated legal text processing problems with appropriate approaches. We describe in detail the processing steps for each task such as problem formulation, data processing and augmentation, pretraining, finetuning. In addition, we introduce to the community two pretrained models that take advantage of parallel translations in legal domain, NFSP and NMSP. In which, NFSP achieves the state-of-the-art result in Task 5 of the competition. Although the paper focuses on technical reporting, the novelty of its approaches can also be an useful reference in automated legal document processing using Transformer-based models.
翻译:在本文中,我们介绍我们采用基于变换器的模式处理COLIEE 2021自动文本处理竞争的不同问题的方法。法律文件的自动化处理是一项艰巨的任务,因为法律文件的特点以及数据数量的限制。我们通过详细实验发现,基于变换器的预先培训语言模式能够以适当的方法很好地处理自动文本处理问题。我们详细描述每项任务的处理步骤,例如问题拟订、数据处理和增强、预先培训、微调。此外,我们向社区介绍两种预先培训的模式,即NFSP和NMSP。在这两个模型中,NFSP实现了竞争任务5的最新结果。尽管该文件侧重于技术报告,但其方法的新颖性也可以成为使用变换器模型自动处理法律文件的有用参考。