The objective of this paper is to develop predictive models to classify Brazilian legal proceedings in three possible classes of status: (i) archived proceedings, (ii) active proceedings, and (iii) suspended proceedings. This problem's resolution is intended to assist public and private institutions in managing large portfolios of legal proceedings, providing gains in scale and efficiency. In this paper, legal proceedings are made up of sequences of short texts called "motions." We combined several natural language processing (NLP) and machine learning techniques to solve the problem. Although working with Portuguese NLP, which can be challenging due to lack of resources, our approaches performed remarkably well in the classification task, achieving maximum accuracy of .93 and top average F1 Scores of .89 (macro) and .93 (weighted). Furthermore, we could extract and interpret the patterns learned by one of our models besides quantifying how those patterns relate to the classification task. The interpretability step is important among machine learning legal applications and gives us an exciting insight into how black-box models make decisions.
翻译:本文的目的是制定预测模型,将巴西的法律程序分为三种可能的地位类别:(一) 存档程序,(二) 进行中的程序和(三) 暂停的程序,这一问题的解决旨在协助公共和私营机构管理大量的法律诉讼组合,在规模和效率方面带来收益;在本文件中,法律程序由称为“动作”的短文本序列组成。我们结合了几种自然语言处理和机器学习技术来解决问题。虽然与葡萄牙国家语言方案合作(由于缺乏资源而可能具有挑战性),但我们的方法在分类任务中表现得非常出色,达到了.93分和最高平均F1分.89分(宏观)和.93分(加权)的最大精确度。此外,我们可以提取和解释我们一个模型所学的模式,除了量化这些模式与分类任务的关系外,还可以对这些模式中的一种模式所学的模式加以归纳和解释。在机器学习法律应用程序中,解释性步骤很重要,并使我们对黑盒模型如何作出决定有了令人兴奋的洞察力。