通过Doc2Doc信息检索系统遵守法规:欧盟/联合王国立法中文本相似性有限制的案例研究 (Regulatory Compliance through Doc2Doc Information Retrieval: A case study in EU/UK legislation where text similarity has limitations) - 专知论文

会员服务 ·

0

INFORMS · IR · 相似度 · CASE · 信息检索 ·

2021 年 1 月 26 日

Regulatory Compliance through Doc2Doc Information Retrieval: A case study in EU/UK legislation where text similarity has limitations

翻译：通过Doc2Doc信息检索系统遵守法规:欧盟/联合王国立法中文本相似性有限制的案例研究

Ilias Chalkidis,Manos Fergadiotis,Nikolaos Manginas,Eva Katakalou,Prodromos Malakasiotis

from arxiv, Accepted for publication by EACL 2021, 13 pages including references and appendices

Major scandals in corporate history have urged the need for regulatory compliance, where organizations need to ensure that their controls (processes) comply with relevant laws, regulations, and policies. However, keeping track of the constantly changing legislation is difficult, thus organizations are increasingly adopting Regulatory Technology (RegTech) to facilitate the process. To this end, we introduce regulatory information retrieval (REG-IR), an application of document-to-document information retrieval (DOC2DOC IR), where the query is an entire document making the task more challenging than traditional IR where the queries are short. Furthermore, we compile and release two datasets based on the relationships between EU directives and UK legislation. We experiment on these datasets using a typical two-step pipeline approach comprising a pre-fetcher and a neural re-ranker. Experimenting with various pre-fetchers from BM25 to k nearest neighbors over representations from several BERT models, we show that fine-tuning a BERT model on an in-domain classification task produces the best representations for IR. We also show that neural re-rankers under-perform due to contradicting supervision, i.e., similar query-document pairs with opposite labels. Thus, they are biased towards the pre-fetcher's score. Interestingly, applying a date filter further improves the performance, showcasing the importance of the time dimension.

翻译：公司历史上的重大丑闻敦促遵守规章,因为各组织需要确保其管制(程序)符合有关的法律、规章和政策;然而,跟踪不断变化的立法是困难的,因此各组织越来越多地采用监管技术(Regtech)来推动这一过程;为此,我们采用监管信息检索(REG-IR),即文件到文件信息检索的应用(DOC2DOC IR),查询是整个文件,使得任务比传统的内部档案(IR)更具有挑战性;此外,我们根据欧盟指令与联合王国立法之间的关系汇编和发布两个数据集。我们用典型的双步管道方法对这些数据集进行实验,其中包括预选和神经重新排位。我们用各种预扩展器进行实验,从BM25到最近的邻居进行文件检索(DOC2DOC IR),在几个BERT模型的演示中,我们展示了在内部分类任务方面对BERT模型的微调,为IRA提供了最佳的表述。我们还显示,根据欧盟指令与联合王国立法之间的关系,我们用典型的神经收缩器重新排列了两个阶段的数据集,以反级的评分级日期。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

专知会员服务

34+阅读 · 2020年8月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【ACL 2019 Tutorials】把维基百科作为文本分析和检索的资源（Wikipedia as a Resource for Text Analysis and Retrieval），Marius Pasca

【ACL 2019 Tutorials】把维基百科作为文本分析和检索的资源（Wikipedia as a Resource for Text Analysis and Retrieval），Marius Pasca

专知会员服务

7+阅读 · 2019年11月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

281+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【VLDB2019 tutorial】Combating Fake News: A Data Management and Mining Perspective，不列颠哥伦比亚大|Laks V.S. Lakshmanan，Michael Simpson，Sara Thirumuruganathan，156页PDF

【VLDB2019 tutorial】Combating Fake News: A Data Management and Mining Perspective，不列颠哥伦比亚大|Laks V.S. Lakshmanan，Michael Simpson，Sara Thirumuruganathan，156页PDF

专知会员服务

13+阅读 · 2019年8月27日

已删除

将门创投

9+阅读 · 2019年11月15日

计算机 | 国际会议信息5条

计算机 | 国际会议信息5条

Call4Papers

3+阅读 · 2019年7月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇主题模型相关论文—动态主题模型、主题趋势、大规模并行采样、随机采样、非参主题建模

【论文推荐】最新六篇主题模型相关论文—动态主题模型、主题趋势、大规模并行采样、随机采样、非参主题建模

专知

14+阅读 · 2018年6月24日

【论文推荐】最新六篇知识图谱相关论文—全局关系嵌入、时序关系提取、对抗学习、远距离关系、时序知识图谱

【论文推荐】最新六篇知识图谱相关论文—全局关系嵌入、时序关系提取、对抗学习、远距离关系、时序知识图谱

专知

23+阅读 · 2018年4月24日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Arxiv

0+阅读 · 2021年3月22日

Connecting Images through Time and Sources: Introducing Low-data, Heterogeneous Instance Retrieval

Arxiv

0+阅读 · 2021年3月19日

Dynamic Model for Query-Document Expansion towards Improving Retrieval Relevance

Arxiv

0+阅读 · 2021年3月18日

TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation

Arxiv

1+阅读 · 2021年3月18日

Image-to-Image Retrieval by Learning Similarity between Scene Graphs

Arxiv

21+阅读 · 2020年12月29日

Towards Making the Most of BERT in Neural Machine Translation

Arxiv

5+阅读 · 2020年3月26日

An Analysis of Object Embeddings for Image Retrieval

An Analysis of Object Embeddings for Image Retrieval

Arxiv

4+阅读 · 2019年5月28日

End-to-End Learning for Answering Structured Queries Directly over Text

Arxiv

3+阅读 · 2018年11月16日

$ρ$-hot Lexicon Embedding-based Two-level LSTM for Sentiment Analysis

Arxiv

6+阅读 · 2018年3月21日

Content based video retrieval

Arxiv

3+阅读 · 2012年11月20日

VIP会员

文章信息

相关主题

相关VIP内容

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

专知会员服务

34+阅读 · 2020年8月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【ACL 2019 Tutorials】把维基百科作为文本分析和检索的资源（Wikipedia as a Resource for Text Analysis and Retrieval），Marius Pasca

【ACL 2019 Tutorials】把维基百科作为文本分析和检索的资源（Wikipedia as a Resource for Text Analysis and Retrieval），Marius Pasca

专知会员服务

7+阅读 · 2019年11月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

281+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【VLDB2019 tutorial】Combating Fake News: A Data Management and Mining Perspective，不列颠哥伦比亚大|Laks V.S. Lakshmanan，Michael Simpson，Sara Thirumuruganathan，156页PDF

【VLDB2019 tutorial】Combating Fake News: A Data Management and Mining Perspective，不列颠哥伦比亚大|Laks V.S. Lakshmanan，Michael Simpson，Sara Thirumuruganathan，156页PDF

专知会员服务

13+阅读 · 2019年8月27日

热门VIP内容

开通专知VIP会员享更多权益服务

赋能真实世界：基于大语言模型的产业智能体技术、实践与评测综述

军事行动中人工智能系统目标交战的附带损伤评估模型 | 最新文献

【普林斯顿博士论文】面向人本机器人学的安全与学习博弈论融合

美陆军协会（AUSA）2025 年会公布的美国十大武器与防务产品创新

相关资讯

已删除

将门创投

9+阅读 · 2019年11月15日

计算机 | 国际会议信息5条

计算机 | 国际会议信息5条

Call4Papers

3+阅读 · 2019年7月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇主题模型相关论文—动态主题模型、主题趋势、大规模并行采样、随机采样、非参主题建模

【论文推荐】最新六篇主题模型相关论文—动态主题模型、主题趋势、大规模并行采样、随机采样、非参主题建模

专知

14+阅读 · 2018年6月24日

【论文推荐】最新六篇知识图谱相关论文—全局关系嵌入、时序关系提取、对抗学习、远距离关系、时序知识图谱

【论文推荐】最新六篇知识图谱相关论文—全局关系嵌入、时序关系提取、对抗学习、远距离关系、时序知识图谱

专知

23+阅读 · 2018年4月24日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

相关论文

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Arxiv

0+阅读 · 2021年3月22日

Connecting Images through Time and Sources: Introducing Low-data, Heterogeneous Instance Retrieval

Arxiv

0+阅读 · 2021年3月19日

Dynamic Model for Query-Document Expansion towards Improving Retrieval Relevance

Arxiv

0+阅读 · 2021年3月18日

TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation

Arxiv

1+阅读 · 2021年3月18日

Image-to-Image Retrieval by Learning Similarity between Scene Graphs

Arxiv

21+阅读 · 2020年12月29日

Towards Making the Most of BERT in Neural Machine Translation

Arxiv

5+阅读 · 2020年3月26日

An Analysis of Object Embeddings for Image Retrieval

An Analysis of Object Embeddings for Image Retrieval

Arxiv

4+阅读 · 2019年5月28日

End-to-End Learning for Answering Structured Queries Directly over Text

Arxiv

3+阅读 · 2018年11月16日

$ρ$-hot Lexicon Embedding-based Two-level LSTM for Sentiment Analysis

Arxiv

6+阅读 · 2018年3月21日

Content based video retrieval

Arxiv

3+阅读 · 2012年11月20日

微信扫码咨询专知VIP会员