Natural Language Processing in the legal domain been benefited hugely by the emergence of Transformer-based Pre-trained Language Models (PLMs) pre-trained on legal text. There exist PLMs trained over European and US legal text, most notably LegalBERT. However, with the rapidly increasing volume of NLP applications on Indian legal documents, and the distinguishing characteristics of Indian legal text, it has become necessary to pre-train LMs over Indian legal text as well. In this work, we introduce transformer-based PLMs pre-trained over a large corpus of Indian legal documents. We also apply these PLMs over several benchmark legal NLP tasks over Indian legal documents, namely, Legal Statute Identification from facts, Semantic segmentation of court judgements, and Court Judgement Prediction. Our experiments demonstrate the utility of the India-specific PLMs developed in this work.
翻译:法律领域的自然语言处理因在法律文本方面经过预先培训的以变换为基础的预先培训语言模式(PLM)的出现而获益匪浅,现有在欧洲和美国法律文本,特别是法律文本方面受过培训的PLM人,然而,随着印度法律文件的NLP申请量迅速增加,印度法律文本的特性也变得有必要对印度法律文本进行LMS预先培训。在这项工作中,我们引进了以变换为基础的以变换为基础的PLM人,对印度的大量法律文件进行了预先培训。我们还将这些PLM人适用于印度法律文件方面的几项基准法律法律、NLP任务,即《从事实中识别法规》、法院判决的语义分割和《法院判决书预测》。我们的实验表明,在这项工作中开发的印度特有的PLMs是有用的。