阿拉伯跨语言信息检索系统(CLAIRE) (The Cross-Lingual Arabic Information REtrieval (CLAIRE) System) - 专知论文

会员服务 ·

0

INFORMS · 信息检索 · 可约的 · 词向量表示 · MoDELS ·

2021 年 7 月 29 日

The Cross-Lingual Arabic Information REtrieval (CLAIRE) System

翻译：阿拉伯跨语言信息检索系统(CLAIRE)

Zhizhong Chen,Carsten Eickhoff

Despite advances in neural machine translation, cross-lingual retrieval tasks in which queries and documents live in different natural language spaces remain challenging. Although neural translation models may provide an intuitive approach to tackle the cross-lingual problem, their resource-consuming training and advanced model structures may complicate the overall retrieval pipeline and reduce users engagement. In this paper, we build our end-to-end Cross-Lingual Arabic Information REtrieval (CLAIRE) system based on the cross-lingual word embedding where searchers are assumed to have a passable passive understanding of Arabic and various supporting information in English is provided to aid retrieval experience. The proposed system has three major advantages: (1) The usage of English-Arabic word embedding simplifies the overall pipeline and avoids the potential mistakes caused by machine translation. (2) Our CLAIRE system can incorporate arbitrary word embedding-based neural retrieval models without structural modification. (3) Early empirical results on an Arabic news collection show promising performance.

翻译：尽管在神经机器翻译方面取得了进展,但跨语言检索任务仍然具有挑战性,因为查询和文件在不同自然语言空间中存在,虽然神经翻译模式可以为解决跨语言问题提供直观方法,但其耗资培训和先进的模型结构可能会使整个检索管道复杂化,减少用户参与。在本文件中,我们根据跨语言嵌入词端到端的跨语言阿拉伯语信息检索网(CLAIRE)系统建立我们基于跨语言嵌入词的端到端的跨语言的阿拉伯语信息检索网(CLAIRE)系统,其中搜索者假定对阿拉伯语有可逾越的被动理解,并且提供各种英文辅助信息,以帮助检索经验。拟议的系统有三大优点:(1) 使用英语嵌入词将整个管道简化,避免机器翻译可能造成的错误。(2) 我们的CLAIRE系统可以在不作结构修改的情况下纳入任意的单词嵌入神经检索模型。(3) 阿拉伯语新闻收藏的早期经验结果显示有良好的表现。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【KDD2021】检索交互机的表格数据预测

专知会员服务

16+阅读 · 2021年8月13日

自然语言处理顶会EMNLP2020接受论文列表，754篇论文都在这儿了！

自然语言处理顶会EMNLP2020接受论文列表，754篇论文都在这儿了！

专知会员服务

28+阅读 · 2020年10月26日

现代机器学习技术导论，596页pdf

专知会员服务

168+阅读 · 2020年7月27日

如何写一份有效的机器学习/自然语言处理论文摘要？ Elvis Saravia

如何写一份有效的机器学习/自然语言处理论文摘要？ Elvis Saravia

专知会员服务

38+阅读 · 2020年5月17日

【ACL2020-Allen AI】预训练语言模型中的无监督域聚类

【ACL2020-Allen AI】预训练语言模型中的无监督域聚类

专知会员服务

24+阅读 · 2020年4月7日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

每周一起读 × 招募 | ACL 2019：基于知识增强的语言表示模型

每周一起读 × 招募 | ACL 2019：基于知识增强的语言表示模型

PaperWeekly

8+阅读 · 2019年6月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：位置感知的长序列会话推荐

LibRec 精选：位置感知的长序列会话推荐

LibRec智能推荐

3+阅读 · 2019年5月17日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

干货 | 为你解读34篇ACL论文

干货 | 为你解读34篇ACL论文

数据派THU

8+阅读 · 2018年6月7日

25篇AAAI 2018接收论文在哈工大直播预讲，顶会预先看！

25篇AAAI 2018接收论文在哈工大直播预讲，顶会预先看！

AI科技评论

6+阅读 · 2018年1月7日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Improving Arabic Diacritization by Learning to Diacritize and Translate

Arxiv

0+阅读 · 2021年9月29日

Graph-based Hierarchical Relevance Matching Signals for Ad-hoc Retrieval

Arxiv

10+阅读 · 2021年2月22日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

All Word Embeddings from One Embedding

Arxiv

4+阅读 · 2020年5月25日

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking

Arxiv

8+阅读 · 2020年3月3日

An Analysis of Object Embeddings for Image Retrieval

An Analysis of Object Embeddings for Image Retrieval

Arxiv

4+阅读 · 2019年5月28日

Baselines and test data for cross-lingual inference

Arxiv

3+阅读 · 2018年3月2日

Improving Sentiment Analysis in Arabic Using Word Representation

Arxiv

4+阅读 · 2018年2月28日

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

Arxiv

3+阅读 · 2018年1月19日

Fluency-Guided Cross-Lingual Image Captioning

Arxiv

3+阅读 · 2017年8月15日

VIP会员

文章信息

相关主题

词向量表示

相关VIP内容

【KDD2021】检索交互机的表格数据预测

专知会员服务

16+阅读 · 2021年8月13日

自然语言处理顶会EMNLP2020接受论文列表，754篇论文都在这儿了！

自然语言处理顶会EMNLP2020接受论文列表，754篇论文都在这儿了！

专知会员服务

28+阅读 · 2020年10月26日

现代机器学习技术导论，596页pdf

专知会员服务

168+阅读 · 2020年7月27日

如何写一份有效的机器学习/自然语言处理论文摘要？ Elvis Saravia

如何写一份有效的机器学习/自然语言处理论文摘要？ Elvis Saravia

专知会员服务

38+阅读 · 2020年5月17日

【ACL2020-Allen AI】预训练语言模型中的无监督域聚类

【ACL2020-Allen AI】预训练语言模型中的无监督域聚类

专知会员服务

24+阅读 · 2020年4月7日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】语义提示扩散变换器的像素级精确深度估计

俄乌冲突的地缘政治与军事教训（万字长文）

【博士论文】弥合多模态基础模型与世界模型之间的鸿沟

量子增强计算机视觉：超越经典算法

相关资讯

每周一起读 × 招募 | ACL 2019：基于知识增强的语言表示模型

每周一起读 × 招募 | ACL 2019：基于知识增强的语言表示模型

PaperWeekly

8+阅读 · 2019年6月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：位置感知的长序列会话推荐

LibRec 精选：位置感知的长序列会话推荐

LibRec智能推荐

3+阅读 · 2019年5月17日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

干货 | 为你解读34篇ACL论文

干货 | 为你解读34篇ACL论文

数据派THU

8+阅读 · 2018年6月7日

25篇AAAI 2018接收论文在哈工大直播预讲，顶会预先看！

25篇AAAI 2018接收论文在哈工大直播预讲，顶会预先看！

AI科技评论

6+阅读 · 2018年1月7日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

相关论文

Improving Arabic Diacritization by Learning to Diacritize and Translate

Arxiv

0+阅读 · 2021年9月29日

Graph-based Hierarchical Relevance Matching Signals for Ad-hoc Retrieval

Arxiv

10+阅读 · 2021年2月22日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

All Word Embeddings from One Embedding

Arxiv

4+阅读 · 2020年5月25日

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking

Arxiv

8+阅读 · 2020年3月3日

An Analysis of Object Embeddings for Image Retrieval

An Analysis of Object Embeddings for Image Retrieval

Arxiv

4+阅读 · 2019年5月28日

Baselines and test data for cross-lingual inference

Arxiv

3+阅读 · 2018年3月2日

Improving Sentiment Analysis in Arabic Using Word Representation

Arxiv

4+阅读 · 2018年2月28日

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

Arxiv

3+阅读 · 2018年1月19日

Fluency-Guided Cross-Lingual Image Captioning

Arxiv

3+阅读 · 2017年8月15日

微信扫码咨询专知VIP会员