基于注意力机制模型与LLM提示的可解释法规预测 (Explainable Statute Prediction via Attention-based Model and LLM Prompting)

In this paper, we explore the problem of automatic statute prediction where for a given case description, a subset of relevant statutes are to be predicted. Here, the term "statute" refers to a section, a sub-section, or an article of any specific Act. Addressing this problem would be useful in several applications such as AI-assistant for lawyers and legal question answering system. For better user acceptance of such Legal AI systems, we believe the predictions should also be accompanied by human understandable explanations. We propose two techniques for addressing this problem of statute prediction with explanations -- (i) AoS (Attention-over-Sentences) which uses attention over sentences in a case description to predict statutes relevant for it and (ii) LLMPrompt which prompts an LLM to predict as well as explain relevance of a certain statute. AoS uses smaller language models, specifically sentence transformers and is trained in a supervised manner whereas LLMPrompt uses larger language models in a zero-shot manner and explores both standard as well as Chain-of-Thought (CoT) prompting techniques. Both these models produce explanations for their predictions in human understandable forms. We compare statute prediction performance of both the proposed techniques with each other as well as with a set of competent baselines, across two popular datasets. Also, we evaluate the quality of the generated explanations through an automated counter-factual manner as well as through human evaluation.

翻译：本文探讨了自动法规预测问题，即针对给定的案例描述，预测其相关的法规子集。此处“法规”指任何特定法案中的章节、子章节或条款。解决此问题在律师人工智能助手和法律问答系统等应用中具有重要价值。为提高此类法律人工智能系统的用户接受度，我们认为预测结果应附带人类可理解的解释。我们提出两种实现可解释法规预测的技术：(i) AoS（基于句子的注意力机制），通过对案例描述中的句子施加注意力来预测相关法规；(ii) LLMPrompt，通过提示大语言模型预测特定法规并解释其相关性。AoS采用较小规模的语言模型（特别是句子Transformer），以监督学习方式进行训练；而LLMPrompt采用零样本方式使用更大规模的语言模型，并探索了标准提示与思维链提示两种技术。两种模型均能以人类可理解的形式为预测结果生成解释。我们在两个常用数据集上，将两种提出技术的法规预测性能进行了相互比较，并与一组有效基线模型进行了对比。此外，我们通过自动反事实分析和人工评估两种方式，对所生成解释的质量进行了系统评估。