Large language models that are capable of zero or few-shot prompting approaches have given rise to the new research area of prompt engineering. Recent advances showed that for example Chain-of-Thought (CoT) prompts can improve arithmetic or common sense tasks significantly. We explore how such approaches fair with legal reasoning tasks and take the COLIEE entailment task based on the Japanese Bar exam for testing zero-shot/few-shot and fine-tuning approaches. Our findings show that while CoT prompting and fine-tuning with explanations approaches show improvements, the best results are produced by prompts that are derived from specific legal reasoning techniques such as IRAC (Issue, Rule, Application, Conclusion). Based on our experiments we improve the 2021 best result from 0.7037 accuracy to 0.8148 accuracy and beat the 2022 best system of 0.6789 accuracy with an accuracy of 0.7431.
翻译:近期的进展表明,例如,研究链(CoT)的提示能大大改进算术或常识任务。我们探索这些方法如何公平,并承担法律推理任务,并承担基于日本律师考试的COLIEE要求的任务,以测试零射/few-shot和微调方法为基础。我们的调查结果显示,虽然CoT的提示和微调方法显示改进,但通过解释方法的改进,最佳结果来自具体法律推理技术的提示,如IRAC(问题、规则、应用、结论)。我们根据我们的实验,我们把2021年的最佳结果从0.7037精确度提高到0.8148精确度,并比2022年0.6789准确度的最佳系统,精确度为0.7431。