In a legal system, judgment consistency is regarded as one of the most important manifestations of fairness. However, due to the complexity of factual elements that impact sentencing in real-world scenarios, few works have been done on quantitatively measuring judgment consistency towards real-world data. In this paper, we propose an evaluation metric for judgment inconsistency, Legal Inconsistency Coefficient (LInCo), which aims to evaluate inconsistency between data groups divided by specific features (e.g., gender, region, race). We propose to simulate judges from different groups with legal judgment prediction (LJP) models and measure the judicial inconsistency with the disagreement of the judgment results given by LJP models trained on different groups. Experimental results on the synthetic data verify the effectiveness of LInCo. We further employ LInCo to explore the inconsistency in real cases and come to the following observations: (1) Both regional and gender inconsistency exist in the legal system, but gender inconsistency is much less than regional inconsistency; (2) The level of regional inconsistency varies little across different time periods; (3) In general, judicial inconsistency is negatively correlated with the severity of the criminal charges. Besides, we use LInCo to evaluate the performance of several de-bias methods, such as adversarial learning, and find that these mechanisms can effectively help LJP models to avoid suffering from data bias.
翻译:在法律体系中,判决的一致性被视为一种最重要的公平表现,然而,由于在现实世界情景中影响判决的事实要素的复杂性,在量化衡量判决对真实世界数据的一致性方面所做的工作很少,在本文件中,我们建议对判决不一致性进行评估,《法律不一致系数》(LInCo),其目的是评价因具体特征(如性别、区域、种族)而分裂的数据组之间的不一致性;我们提议以法律判决预测模型模拟不同群体的法官,并衡量司法与在不同群体中培训的LJP模型所作判决结果不一致的不一致性;合成数据的实验结果验证LINCo的有效性。我们进一步利用LInCo来探讨实际案件中的不一致性,并得出以下意见:(1) 法律制度中存在区域和性别不一致性,但与区域不一致性大不相同;(2) 各区域的不一致性程度在不同时期之间差别不大;(3) 一般而言,司法不一致性与刑事指控的严重程度有负关联性;此外,我们利用LInCo来有效评估若干对抗性司法模式的绩效,从而避免了Lbasime-judal-ha