In the Indian court system, pending cases have long been a problem. There are more than 4 crore cases outstanding. Manually summarising hundreds of documents is a time-consuming and tedious task for legal stakeholders. Many state-of-the-art models for text summarization have emerged as machine learning has progressed. Domain-independent models don't do well with legal texts, and fine-tuning those models for the Indian Legal System is problematic due to a lack of publicly available datasets. To improve the performance of domain-independent models, the authors have proposed a methodology for normalising legal texts in the Indian context. The authors experimented with two state-of-the-art domain-independent models for legal text summarization, namely BART and PEGASUS. BART and PEGASUS are put through their paces in terms of extractive and abstractive summarization to understand the effectiveness of the text normalisation approach. Summarised texts are evaluated by domain experts on multiple parameters and using ROUGE metrics. It shows the proposed text normalisation approach is effective in legal texts with domain-independent models.
翻译:在印度法院系统中,待决案件长期以来一直是一个问题,有4个以上未决案件,有4个以上未决案件。对法律利益攸关方来说,人工总结数百份文件是一项耗时和繁琐的任务。随着机器学习的进展,出现了许多最先进的文本总结模型。自成一体的模式在法律文本方面不尽如人意,对印度法律制度的这些模型进行微调是有问题的,因为缺乏公开的数据集。为改善独立域模型的性能,作者们提出了一个在印度情况下使法律文本正常化的方法。作者们试用了两种最先进的、最先进的域独立的法律文本总结模型,即BART和PEGASUS。BART和PEGASUS的步调是按其采掘和抽象总结速度来理解文本正常化方法的有效性的。根据多种参数和使用ROUGE的衡量标准,对摘要文本进行了评估。它表明拟议的文本正常化方法在法律文本中与自成一体的模式是有效的。