Statutory reasoning is the task of reasoning with facts and statutes, which are rules written in natural language by a legislature. It is a basic legal skill. In this paper we explore the capabilities of the most capable GPT-3 model, text-davinci-003, on an established statutory-reasoning dataset called SARA. We consider a variety of approaches, including dynamic few-shot prompting, chain-of-thought prompting, and zero-shot prompting. While we achieve results with GPT-3 that are better than the previous best published results, we also identify several types of clear errors it makes. In investigating why these happen, we discover that GPT-3 has imperfect prior knowledge of the actual U.S. statutes on which SARA is based. More importantly, GPT-3 performs poorly at answering straightforward questions about simple synthetic statutes. By also posing the same questions when the synthetic statutes are written in sentence form, we find that some of GPT-3's poor performance results from difficulty in parsing the typical structure of statutes, containing subsections and paragraphs.
翻译:法定推理是用立法者以自然语言编写的自然法规和法规进行推理的任务,这是一种基本的法律技能。在本文中,我们探索了最有能力的GPT-3模型(即Sext-davinci-003)在既定的法定理由数据集(SARA)上的能力。我们考虑了各种办法,包括动态的几射、思维链的推理和零射的推理。我们用GPT-3取得了优于以往最佳公布结果的成果,但我们也发现了几种明显的错误。在调查这些结果发生的原因时,我们发现GPT-3对SA所依据的实际美国法规有不完全的了解。更重要的是,GPT-3在回答简单合成法规的直截了当问题方面表现不善。我们还在合成法规以判决形式写成时提出同样的问题,我们发现GPT-3的某些不良业绩是由于难以区分典型的法规结构,包含分节和段落。