数据效率高的端对端信息提取用于统计法律分析 (Data-efficient End-to-end Information Extraction for Statistical Legal Analysis)

Legal practitioners often face a vast amount of documents. Lawyers, for instance, search for appropriate precedents favorable to their clients, while the number of legal precedents is ever-growing. Although legal search engines can assist finding individual target documents and narrowing down the number of candidates, retrieved information is often presented as unstructured text and users have to examine each document thoroughly which could lead to information overloading. This also makes their statistical analysis challenging. Here, we present an end-to-end information extraction (IE) system for legal documents. By formulating IE as a generation task, our system can be easily applied to various tasks without domain-specific engineering effort. The experimental results of four IE tasks on Korean precedents shows that our IE system can achieve competent scores (-2.3 on average) compared to the rule-based baseline with as few as 50 training examples per task and higher score (+5.4 on average) with 200 examples. Finally, our statistical analysis on two case categories--drunk driving and fraud--with 35k precedents reveals the resulting structured information from our IE system faithfully reflects the macroscopic features of Korean legal system.

翻译：例如,律师寻找适合其客户的适当先例,而法律先例的数量却在不断增加。虽然法律搜索引擎可以帮助寻找个人目标文件,缩小候选人人数,但检索的信息往往作为非结构化文本提供,用户必须彻底审查每份文件,从而导致信息超载。这也使其统计分析具有挑战性。在这里,我们为法律文件提出了一个端对端信息提取系统(IE)系统。通过将IE系统作为一代人的任务,我们的系统可以很容易地应用于各种任务,而无需具体领域的工程努力。韩国四项IE系统的实验结果显示,与基于规则的基线相比,我们IE系统平均能够达到合格的分数(2.3分),而每个任务只有50个培训实例,高分(平均+5.4分),有200个实例。最后,我们对两个案件类别-Drunk驱动和欺诈的统计分析有35k个先例,揭示了我们IE系统由此得出的结构化信息忠实地反映了韩国法律体系的宏观特征。