Artificial intelligence is being utilized in many domains as of late, and the legal system is no exception. However, as it stands now, the number of well-annotated datasets pertaining to legal documents from the Supreme Court of the United States (SCOTUS) is very limited for public use. Even though the Supreme Court rulings are public domain knowledge, trying to do meaningful work with them becomes a much greater task due to the need to manually gather and process that data from scratch each time. Hence, our goal is to create a high-quality dataset of SCOTUS court cases so that they may be readily used in natural language processing (NLP) research and other data-driven applications. Additionally, recent advances in NLP provide us with the tools to build predictive models that can be used to reveal patterns that influence court decisions. By using advanced NLP algorithms to analyze previous court cases, the trained models are able to predict and classify a court's judgment given the case's facts from the plaintiff and the defendant in textual format; in other words, the model is emulating a human jury by generating a final verdict.
翻译:目前,美国最高法院(SCOTUS)与法律文件有关的有详细说明的数据集数量非常有限,供公众使用。尽管最高法院的裁决是公共领域的知识,但由于需要从头到尾手工收集和处理数据,试图与它们开展有意义的工作,因此任务就大得多。因此,我们的目标是建立SCOTUS法院案例的高质量数据集,以便它们能够随时用于自然语言处理(NLP)研究和其他数据驱动的应用。此外,NLP的最新进展为我们提供了工具,用以建立预测模型,用以揭示影响法院裁决的模式。通过使用先进的NLP算法分析以往法院案例,经过培训的模型能够预测和分类法院根据原告和被告的文字形式提供的案件事实所作的判决;换句话说,该模型正在模拟一个人类陪审团,通过产生最终裁决。