Seeking legal advice is often expensive. Recent advancement in machine learning for solving complex problems can be leveraged to help make legal services more accessible to the public. However, real-life applications encounter significant challenges. State-of-the-art language models are growing increasingly large, making parameter-efficient learning increasingly important. Unfortunately, parameter-efficient methods perform poorly with small amounts of data, which are common in the legal domain (where data labelling costs are high). To address these challenges, we propose parameter-efficient legal domain adaptation, which uses vast unsupervised legal data from public legal forums to perform legal pre-training. This method exceeds or matches the fewshot performance of existing models such as LEGAL-BERT on various legal tasks while tuning only approximately 0.1% of model parameters. Additionally, we show that our method can achieve calibration comparable to existing methods across several tasks. To the best of our knowledge, this work is among the first to explore parameter-efficient methods of tuning language models toward the legal domain.
翻译:寻求法律咨询往往费用高昂。最近解决复杂问题的机器学习进展可以被利用,帮助公众更容易获得法律服务。然而,现实生活中的应用面临重大挑战。最先进的语言模式正在变得越来越庞大,使具有参数效率的学习变得日益重要。不幸的是,参数效率方法在少量数据方面表现不佳,而这些数据在法律领域(数据标签成本高)是常见的。为了应对这些挑战,我们建议对参数效率法律领域进行调整,利用公共法律论坛大量不受监督的法律数据进行法律预培训。这种方法超过或符合现有模型的微小性能,例如法律-BERT在各种法律任务方面的少数功能,同时只调整大约0.1%的示范参数。此外,我们表明,我们的方法可以实现与现有方法相似的校准,涉及若干任务(数据标签成本高)。根据我们的知识,这项工作是首先探索将语言模式调整到法律领域的参数效率方法。