Given the fact of a case, Legal Judgment Prediction (LJP) involves a series of sub-tasks such as predicting violated law articles, charges and term of penalty. We propose leveraging a unified text-to-text Transformer for LJP, where the dependencies among sub-tasks can be naturally established within the auto-regressive decoder. Compared with previous works, it has three advantages: (1) it fits in the pretraining pattern of masked language models, and thereby can benefit from the semantic prompts of each sub-task rather than treating them as atomic labels, (2) it utilizes a single unified architecture, enabling full parameter sharing across all sub-tasks, and (3) it can incorporate both classification and generative sub-tasks. We show that this unified transformer, albeit pretrained on general-domain text, outperforms pretrained models tailored specifically for the legal domain. Through an extensive set of experiments, we find that the best order to capture dependencies is different from human intuitions, and the most reasonable logical order for humans can be sub-optimal for the model. We further include two more auxiliary tasks: court view generation and article content prediction, showing they can not only improve the prediction accuracy, but also provide interpretable explanations for model outputs even when an error is made. With the best configuration, our model outperforms both previous SOTA and a single-tasked version of the unified transformer by a large margin.
翻译:鉴于一个案例的事实,法律判决预测(LJP)包含一系列子任务,例如预测违法的法律条款、指控和刑罚期限。我们提议为LJP利用一个统一的文本到文本变换器,使子任务之间的依赖性自然地在自动递减解调中建立。与以前的工作相比,它有三个优点:(1)它适合隐蔽语言模式的培训前模式,因此可以受益于每个子任务的语义上的精度,而不是将它们作为原子标签对待,(2)它使用一个单一的统一结构,使所有子任务能够完全共享参数,(3)它可以同时包含分类和基因化子任务。我们表明,这个统一的变换器尽管事先训练了一般文字,但优于专门为法律领域设计的预先训练模式。通过广泛的实验,我们发现,获取依赖性的最佳秩序不同于人的直觉,而人类最合理的逻辑顺序甚至可以是模型化的亚缩图,使所有子任务能够在所有子任务之间共享全部子任务,并且能够同时包含归正子子子子子的子变换子。我们还可以提供两个附加的变校程,但只能提供一种对模型的预测。