Large language models that are capable of zero or few-shot prompting approaches have given rise to the new research area of prompt engineering. Recent advances showed that for example Chain-of-Thought (CoT) prompts can improve arithmetic or common sense tasks significantly. We explore how such approaches fare with legal reasoning tasks and take the COLIEE entailment task based on the Japanese Bar exam for testing zero-shot/few-shot and fine-tuning approaches. Our findings show that while CoT prompting and fine-tuning with explanations approaches show improvements, the best results are produced by prompts that are derived from specific legal reasoning techniques such as IRAC (Issue, Rule, Application, Conclusion). Based on our experiments we improve the 2021 best result from 0.7037 accuracy to 0.8148 accuracy and beat the 2022 best system of 0.6789 accuracy with an accuracy of 0.7431.
翻译:最近的进步表明,例如,研究链(CoT)的提示可以极大地改进算术或常识任务。我们探索这些方法如何与法律推理任务相配合,并承担COLIEE的必然任务,其基础是日本律师考试,测试零射/光照和微调方法。我们的调查结果显示,虽然CoT的提示和微调方法的改进和微调表明改进,但最佳结果来自具体法律推理技术的提示,如IRAC(问题、规则、应用、结论)。我们根据我们的实验,我们把2021年的最佳结果从0.7037精确度提高到0.8148精确度,并以0.7431的精确度比2022年的0.6789精确度最佳制度更精确,比2022年的0.6789精确度为0.7431。