简化您的法律: 使用信息理论来复制法律文件 (Simplify Your Law: Using Information Theory to Deduplicate Legal Documents) - 专知论文

会员服务 ·

0

INFORMS · 信息理论 · Extensibility · 可辨认的 · 极小点 ·

2021 年 10 月 2 日

Simplify Your Law: Using Information Theory to Deduplicate Legal Documents

翻译：简化您的法律: 使用信息理论来复制法律文件

Corinna Coupette,Jyotsna Singh,Holger Spamann

from arxiv, 8 pages, 3 figures; to appear in ICDMW 2021

Textual redundancy is one of the main challenges to ensuring that legal texts remain comprehensible and maintainable. Drawing inspiration from the refactoring literature in software engineering, which has developed methods to expose and eliminate duplicated code, we introduce the duplicated phrase detection problem for legal texts and propose the Dupex algorithm to solve it. Leveraging the Minimum Description Length principle from information theory, Dupex identifies a set of duplicated phrases, called patterns, that together best compress a given input text. Through an extensive set of experiments on the Titles of the United States Code, we confirm that our algorithm works well in practice: Dupex will help you simplify your law.

翻译：文字冗余是确保法律文本保持可理解和可维持的主要挑战之一。从软件工程中重新构思的文献中得到的启发,这些文献已经开发出揭露和消除重复代码的方法,我们为法律文本引入了重复的短语探测问题,并提出了Dupex算法来解决它。Dupex从信息理论中利用最低描述长度原则,从信息理论中找出了一套重复的短语,称为模式,它们一起将一个输入文本压缩得最佳。我们通过对《美国法典》标题进行一系列广泛的实验,确认我们的算法在实践中运作良好:Dupex将帮助您简化法律。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【微软雷德蒙研究院】对抗机器学习工业视角，Adversarial Machine Learning - Industry Perspectives

【微软雷德蒙研究院】对抗机器学习工业视角，Adversarial Machine Learning - Industry Perspectives

专知会员服务

12+阅读 · 2020年2月23日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

CCF推荐 | 国际会议信息6条

CCF推荐 | 国际会议信息6条

Call4Papers

9+阅读 · 2019年8月13日

已删除

将门创投

3+阅读 · 2019年6月12日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Strategies for the Iterated Prisoner's Dilemma

Strategies for the Iterated Prisoner's Dilemma

Arxiv

0+阅读 · 2021年11月29日

Modular Information Flow Through Ownership

Modular Information Flow Through Ownership

Arxiv

0+阅读 · 2021年11月26日

An Optimal Algorithm for Finding Champions in Tournament Graphs

Arxiv

0+阅读 · 2021年11月26日

OTB-morph: One-Time Biometrics via Morphing applied to Face Templates

Arxiv

0+阅读 · 2021年11月25日

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Arxiv

12+阅读 · 2021年2月15日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

Arxiv

3+阅读 · 2019年2月1日

nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation

Arxiv

12+阅读 · 2018年9月27日

A Time Series Graph Cut Image Segmentation Scheme for Liver Tumors

A Time Series Graph Cut Image Segmentation Scheme for Liver Tumors

Arxiv

4+阅读 · 2018年9月13日

What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)

Arxiv

3+阅读 · 2018年2月20日

VIP会员

文章信息

相关主题

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【微软雷德蒙研究院】对抗机器学习工业视角，Adversarial Machine Learning - Industry Perspectives

【微软雷德蒙研究院】对抗机器学习工业视角，Adversarial Machine Learning - Industry Perspectives

专知会员服务

12+阅读 · 2020年2月23日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

AI智能体时代中的记忆：形式、功能与动态综述

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

【斯坦福博士论文】面向地理空间数据的多模态与多尺度建模：时空生成式人工智能

多模态大语言模型下游调优中“保持自我”的重要性

相关资讯

CCF推荐 | 国际会议信息6条

CCF推荐 | 国际会议信息6条

Call4Papers

9+阅读 · 2019年8月13日

已删除

将门创投

3+阅读 · 2019年6月12日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Strategies for the Iterated Prisoner's Dilemma

Strategies for the Iterated Prisoner's Dilemma

Arxiv

0+阅读 · 2021年11月29日

Modular Information Flow Through Ownership

Modular Information Flow Through Ownership

Arxiv

0+阅读 · 2021年11月26日

An Optimal Algorithm for Finding Champions in Tournament Graphs

Arxiv

0+阅读 · 2021年11月26日

OTB-morph: One-Time Biometrics via Morphing applied to Face Templates

Arxiv

0+阅读 · 2021年11月25日

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Arxiv

12+阅读 · 2021年2月15日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

Arxiv

3+阅读 · 2019年2月1日

nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation

Arxiv

12+阅读 · 2018年9月27日

A Time Series Graph Cut Image Segmentation Scheme for Liver Tumors

A Time Series Graph Cut Image Segmentation Scheme for Liver Tumors

Arxiv

4+阅读 · 2018年9月13日

What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)

Arxiv

3+阅读 · 2018年2月20日

微信扫码咨询专知VIP会员