Seeking legal advice is often expensive. Recent advancements in machine learning for solving complex problems can be leveraged to help make legal services more accessible to the public. However, real-life applications encounter significant challenges. State-of-the-art language models are growing increasingly large, making parameter-efficient learning increasingly important. Unfortunately, parameter-efficient methods perform poorly with small amounts of data, which are common in the legal domain (where data labelling costs are high). To address these challenges, we propose parameter-efficient legal domain adaptation, which uses vast unsupervised legal data from public legal forums to perform legal pre-training. This method exceeds or matches the fewshot performance of existing models such as LEGAL-BERT on various legal tasks while tuning only approximately 0.1% of model parameters. Additionally, we show that our method can achieve calibration comparable to existing methods across several tasks. To the best of our knowledge, this work is among the first to explore parameter-efficient methods of tuning language models in the legal domain.
翻译:寻求法律咨询往往费用高昂。最近解决复杂问题的机器学习进展可以用来帮助公众更方便地获得法律服务。然而,现实生活中的应用面临重大挑战。最先进的语言模式正在变得越来越庞大,使具有参数效率的学习变得日益重要。不幸的是,参数效率方法在少量数据方面表现不佳,这些数据在法律领域(数据标签成本高)是常见的。为了应对这些挑战,我们建议对参数效率法律领域进行调整,利用公共法律论坛大量不受监督的法律数据进行法律培训前培训。这种方法在各种法律任务上超过或匹配现有模型的微小性能,如法律-法律-生物伦理学-生物伦理学等,同时只调整大约0.1%的示范参数。此外,我们表明,我们的方法可以实现与现有方法相近的校准,涉及多项任务(数据标签成本高 ) 。根据我们的知识,这项工作是首先探索法律领域调控语言模式的参数效率方法。