With the advent of large language models, methods for abstractive summarization have made great strides, creating potential for use in applications to aid knowledge workers processing unwieldy document collections. One such setting is the Civil Rights Litigation Clearinghouse (CRLC) (https://clearinghouse.net),which posts information about large-scale civil rights lawsuits, serving lawyers, scholars, and the general public. Today, summarization in the CRLC requires extensive training of lawyers and law students who spend hours per case understanding multiple relevant documents in order to produce high-quality summaries of key events and outcomes. Motivated by this ongoing real-world summarization effort, we introduce Multi-LexSum, a collection of 9,280 expert-authored summaries drawn from ongoing CRLC writing. Multi-LexSum presents a challenging multi-document summarization task given the length of the source documents, often exceeding two hundred pages per case. Furthermore, Multi-LexSum is distinct from other datasets in its multiple target summaries, each at a different granularity (ranging from one-sentence "extreme" summaries to multi-paragraph narrations of over five hundred words). We present extensive analysis demonstrating that despite the high-quality summaries in the training data (adhering to strict content and style guidelines), state-of-the-art summarization models perform poorly on this task. We release Multi-LexSum for further research in summarization methods as well as to facilitate development of applications to assist in the CRLC's mission at https://multilexsum.github.io.
翻译:随着大型语言模式的出现,抽象总结方法取得了巨大的进步,创造了用于帮助知识工作者处理复杂文件收藏的应用应用的潜力,其中一个背景是民权诉讼信息中心(CRLC)(https://leighthouse.net),该信息中心张贴关于大规模民权诉讼的信息,为律师、学者和公众提供服务。今天,CRLC的总结需要广泛培训律师和法律学生,他们每案件花几个小时了解多个相关文件,以便产生高质量的重要事件和成果摘要。在目前这种真实世界总结努力的推动下,我们引入了多LexSum,从正在编写的CRLC著作中收集了9,280份专家授权摘要。多LexSum提出了具有挑战性的多文件汇总任务,因为来源文件的篇幅往往超过每个案例的200页。此外,多LexSum在多个目标摘要中与其他数据集有区别,每个都具有不同的颗粒性(从“Exmel-lom”摘要,从“extreme-Lex”摘要,到多段的“Sral-lax-laxal-laxal laxal dial distral distral distration distration distrationalal distrational diction digradududududududududududustral distral diction diction diction diction diction diction),我们在五行中展示了本中进行高校略数据分析,在五种数据分析。