Natural language processing techniques have helped domain experts solve legal problems. Digital availability of court documents increases possibilities for researchers, who can access them as a source for building datasets -- whose disclosure is aligned with good reproducibility practices in computational research. Large and digitized court systems, such as the Brazilian one, are prone to be explored in that sense. However, personal data protection laws impose restrictions on data exposure and state principles about which researchers should be mindful. Special caution must be taken in cases with human rights violations, such as gender discrimination, over which we elaborate as an example of interest. We present legal and ethical considerations on the issue, as well as guidelines for researchers dealing with this kind of data and deciding whether to disclose it.
翻译:以数字形式提供法院文件使研究人员更有可能利用这些文件作为建立数据集的来源 -- -- 这些数据集的披露符合计算研究中良好的再复制做法;大型和数字化的法院系统,例如巴西的系统,容易从这个意义上加以探讨;然而,个人数据保护法对数据接触和研究人员应当注意的州原则施加限制;在诸如性别歧视等侵犯人权的案件中,必须特别谨慎,我们对此作了详细阐述,作为这方面的一个实例;我们对这个问题提出法律和道德方面的考虑,以及处理这类数据并确定是否披露这些数据的研究人员的指导方针。