超越英语而缩小全球实况调查范围的索赔 (Claim Matching Beyond English to Scale Global Fact-Checking) - 专知论文

会员服务 ·

0

缩放 · Performer · MoDELS · 可辨认的 · 数据集 ·

2021 年 6 月 1 日

Claim Matching Beyond English to Scale Global Fact-Checking

翻译：超越英语而缩小全球实况调查范围的索赔

Ashkan Kazemi,Kiran Garimella,Devin Gaffney,Scott A. Hale

from arxiv, to appear in ACL 2021 as a long paper

Manual fact-checking does not scale well to serve the needs of the internet. This issue is further compounded in non-English contexts. In this paper, we discuss claim matching as a possible solution to scale fact-checking. We define claim matching as the task of identifying pairs of textual messages containing claims that can be served with one fact-check. We construct a novel dataset of WhatsApp tipline and public group messages alongside fact-checked claims that are first annotated for containing "claim-like statements" and then matched with potentially similar items and annotated for claim matching. Our dataset contains content in high-resource (English, Hindi) and lower-resource (Bengali, Malayalam, Tamil) languages. We train our own embedding model using knowledge distillation and a high-quality "teacher" model in order to address the imbalance in embedding quality between the low- and high-resource languages in our dataset. We provide evaluations on the performance of our solution and compare with baselines and existing state-of-the-art multilingual embedding models, namely LASER and LaBSE. We demonstrate that our performance exceeds LASER and LaBSE in all settings. We release our annotated datasets, codebooks, and trained embedding model to allow for further research.

翻译：手册中的事实检查并不适合互联网的需求。这个问题在非英语环境中更为复杂。在本文中, 我们讨论将索赔匹配作为衡量事实检查的可能解决方案。我们将索赔匹配定义为确定包含索赔的文本信息对齐的任务, 其中包括可以使用一次事实检查的文本信息对齐; 我们建立一个关于“ WhesApp” 提示线和公共群体信息的新数据集, 以及经过事实核实的主张, 这些主张首先附加说明, 包含“ 类似索赔的声明”, 然后与可能相似的项目相匹配, 并附加索赔匹配的附加说明。我们的数据集包含高资源( 英、印、印) 和低资源( 班加里、马拉亚拉姆、泰米尔) 等语言的内容。我们用知识蒸馏和高质量的“ 教师” 模型来培训我们自己的嵌入模型, 以解决将低资源语言和高资源语言的质量嵌入我们的数据集中的不平衡问题。我们提供了解决方案绩效评估, 并与基线和现有先进的多语言嵌入模型( 即LASER和 LaBSE) 。我们证明我们的业绩超越了我们经过培训的数据, 和升级到所有数据库。

0

相关内容

【干货书】计算机科学家的数学，153页pdf

【干货书】计算机科学家的数学，153页pdf

专知会员服务

176+阅读 · 2021年7月27日

超越三元组:基于超关系知识图谱嵌入的链接预测，Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction

专知会员服务

78+阅读 · 2020年5月11日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

103+阅读 · 2020年4月25日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【MIT】大型元学习数据集（Supplementary Materials for Niseko: a Large-ScaleMeta-Learning Dataset），麻省理工学院博士| Zeyuan Shang

【MIT】大型元学习数据集（Supplementary Materials for Niseko: a Large-ScaleMeta-Learning Dataset），麻省理工学院博士| Zeyuan Shang

专知会员服务

15+阅读 · 2019年12月24日

【NLP| 推荐文章】神经阅读理解与超越（Neural Reading Comprehension And Beyond）

【NLP| 推荐文章】神经阅读理解与超越（Neural Reading Comprehension And Beyond）

专知会员服务

26+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

人工智能 | NIPS 2019等国际会议信息8条

人工智能 | NIPS 2019等国际会议信息8条

Call4Papers

7+阅读 · 2019年3月21日

人工智能 | CCF推荐期刊专刊约稿信息6条

人工智能 | CCF推荐期刊专刊约稿信息6条

Call4Papers

5+阅读 · 2019年2月18日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

人工智能 | PRICAI 2019等国际会议信息9条

人工智能 | PRICAI 2019等国际会议信息9条

Call4Papers

6+阅读 · 2018年12月13日

CCF B类期刊IPM专刊截稿信息1条

CCF B类期刊IPM专刊截稿信息1条

Call4Papers

3+阅读 · 2018年10月11日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

TightCap: 3D Human Shape Capture with Clothing Tightness Field

Arxiv

0+阅读 · 2021年8月6日

Learning Based Proximity Matrix Factorization for Node Embedding

Arxiv

0+阅读 · 2021年8月6日

Generating Fact Checking Explanations

Generating Fact Checking Explanations

Arxiv

9+阅读 · 2020年4月13日

Learning a Matching Model with Co-teaching for Multi-turn Response Selection in Retrieval-based Dialogue Systems

Arxiv

6+阅读 · 2019年6月11日

Nocaps: novel object captioning at scale

Nocaps: novel object captioning at scale

Arxiv

6+阅读 · 2018年12月20日

When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?

Arxiv

3+阅读 · 2018年4月18日

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Arxiv

5+阅读 · 2018年3月27日

Multi-Channel Pyramid Person Matching Network for Person Re-Identification

Arxiv

7+阅读 · 2018年3月7日

Cross-Paced Representation Learning with Partial Curricula for Sketch-based Image Retrieval

Arxiv

8+阅读 · 2018年3月5日

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

Arxiv

4+阅读 · 2017年11月15日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】计算机科学家的数学，153页pdf

【干货书】计算机科学家的数学，153页pdf

专知会员服务

176+阅读 · 2021年7月27日

超越三元组:基于超关系知识图谱嵌入的链接预测，Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction

专知会员服务

78+阅读 · 2020年5月11日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

103+阅读 · 2020年4月25日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【MIT】大型元学习数据集（Supplementary Materials for Niseko: a Large-ScaleMeta-Learning Dataset），麻省理工学院博士| Zeyuan Shang

【MIT】大型元学习数据集（Supplementary Materials for Niseko: a Large-ScaleMeta-Learning Dataset），麻省理工学院博士| Zeyuan Shang

专知会员服务

15+阅读 · 2019年12月24日

【NLP| 推荐文章】神经阅读理解与超越（Neural Reading Comprehension And Beyond）

【NLP| 推荐文章】神经阅读理解与超越（Neural Reading Comprehension And Beyond）

专知会员服务

26+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

数据要素发展报告(2025年)：附下载

人工智能代理提升战时舰船战备水平

【NeurIPS2025教程】大语言模型规划

NeurIPS 2025 教程：深度学习训练不稳定性的理论洞见

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

人工智能 | NIPS 2019等国际会议信息8条

人工智能 | NIPS 2019等国际会议信息8条

Call4Papers

7+阅读 · 2019年3月21日

人工智能 | CCF推荐期刊专刊约稿信息6条

人工智能 | CCF推荐期刊专刊约稿信息6条

Call4Papers

5+阅读 · 2019年2月18日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

人工智能 | PRICAI 2019等国际会议信息9条

人工智能 | PRICAI 2019等国际会议信息9条

Call4Papers

6+阅读 · 2018年12月13日

CCF B类期刊IPM专刊截稿信息1条

CCF B类期刊IPM专刊截稿信息1条

Call4Papers

3+阅读 · 2018年10月11日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

相关论文

TightCap: 3D Human Shape Capture with Clothing Tightness Field

Arxiv

0+阅读 · 2021年8月6日

Learning Based Proximity Matrix Factorization for Node Embedding

Arxiv

0+阅读 · 2021年8月6日

Generating Fact Checking Explanations

Generating Fact Checking Explanations

Arxiv

9+阅读 · 2020年4月13日

Learning a Matching Model with Co-teaching for Multi-turn Response Selection in Retrieval-based Dialogue Systems

Arxiv

6+阅读 · 2019年6月11日

Nocaps: novel object captioning at scale

Nocaps: novel object captioning at scale

Arxiv

6+阅读 · 2018年12月20日

When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?

Arxiv

3+阅读 · 2018年4月18日

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Arxiv

5+阅读 · 2018年3月27日

Multi-Channel Pyramid Person Matching Network for Person Re-Identification

Arxiv

7+阅读 · 2018年3月7日

Cross-Paced Representation Learning with Partial Curricula for Sketch-based Image Retrieval

Arxiv

8+阅读 · 2018年3月5日

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

Arxiv

4+阅读 · 2017年11月15日

微信扫码咨询专知VIP会员