Natural language inference has trended toward studying contexts beyond the sentence level. An important application area is law: past cases often do not foretell how they apply to new situations and implications must be inferred. This paper introduces LawngNLI, constructed from U.S. legal opinions with automatic labels with high human-validated accuracy. Premises are long and multigranular. Experiments show two use cases. First, LawngNLI can benchmark for in-domain generalization from short to long contexts. It has remained unclear if large-scale long-premise NLI datasets actually need to be constructed: near-top performance on long premises could be achievable by fine-tuning using short premises. Without multigranularity, benchmarks cannot distinguish lack of fine-tuning on long premises versus domain shift between short and long datasets. In contrast, our long and short premises share the same examples and domain. Models fine-tuned using several past NLI datasets and/or our short premises fall short of top performance on our long premises. So for at least certain domains (such as ours), large-scale long-premise datasets are needed. Second, LawngNLI can benchmark for implication-based retrieval. Queries are entailed or contradicted by target documents, allowing users to move between arguments and evidence. Leading retrieval models perform reasonably zero shot on a LawngNLI-derived retrieval task. We compare different systems for re-ranking, including lexical overlap and cross-encoders fine-tuned using a modified LawngNLI or past NLI datasets. LawngNLI can train and test systems for implication-based case retrieval and argumentation.
翻译:自然语言的自然推断趋势是研究超出刑期范围的背景。一个重要的应用领域是法律:过去的案件往往不预示如何适用于新的情形和所涉问题。本文介绍LawngNLI, 由具有高人文价值准确度的自动标签的美国法律意见和高人类价值的自动标签构成。 房舍是长期和多面的。 实验显示两种使用案例。 首先, LawngNLI 可以用短期到长时间的内置一般化为基准。 目前还不清楚的是, 是否真的需要建立大型的跨版 NLI 数据集: 使用短期的微调可以实现长房地的近顶级性能。 没有多面性能, 基准无法区分长期房地缺乏微调, 以及短处数据集之间的域变换。 使用过去 NLIIS 数据集和短处短处的模型比重。 因此, 至少在某些领域( 如我们的精细度系统)、 大型的超版级级级级级级性性能性能通过短级级级的比值上, 将高级的LISLILU 翻校准性翻校准 。