具有不受监管域适应的语音搜索查询翻译 (Vernacular Search Query Translation with Unsupervised Domain Adaptation) - 专知论文

会员服务 ·

0

BLEU · 无监督 · 基准 · 情景 · MoDELS ·

2022 年 8 月 7 日

Vernacular Search Query Translation with Unsupervised Domain Adaptation

翻译：具有不受监管域适应的语音搜索查询翻译

Mandar Kulkarni,Nikesh Garera

With the democratization of e-commerce platforms, an increasingly diversified user base is opting to shop online. To provide a comfortable and reliable shopping experience, it's important to enable users to interact with the platform in the language of their choice. An accurate query translation is essential for Cross-Lingual Information Retrieval (CLIR) with vernacular queries. Due to internet-scale operations, e-commerce platforms get millions of search queries every day. However, creating a parallel training set to train an in-domain translation model is cumbersome. This paper proposes an unsupervised domain adaptation approach to translate search queries without using any parallel corpus. We use an open-domain translation model (trained on public corpus) and adapt it to the query data using only the monolingual queries from two languages. In addition, fine-tuning with a small labeled set further improves the result. For demonstration, we show results for Hindi to English query translation and use mBART-large-50 model as the baseline to improve upon. Experimental results show that, without using any parallel corpus, we obtain more than 20 BLEU points improvement over the baseline while fine-tuning with a small 50k labeled set provides more than 27 BLEU points improvement over the baseline.

翻译：随着电子商务平台的民主化,一个日益多样化的用户基础正在选择在线购物。为了提供舒适和可靠的购物经验,必须使用户能够以自己选择的语言与平台互动。准确的查询翻译对于跨语言信息检索检索(CLIR)至关重要。由于互联网规模的运行,电子商务平台每天获得数百万次查询查询。然而,建立一个平行培训主页翻译模式的培训成套培训十分繁琐。本文件建议采用不受监督的域适应方法,在不使用任何平行文件的情况下翻译查询。我们使用开放域翻译模式(在公共电脑上培训),并仅使用两种语言的单语查询将其调整为查询数据。此外,用小标签设置的微调进一步改进了结果。为了演示,我们展示了印地语到英语的查询翻译结果,并使用MBARTART大50模型作为改进的基线。实验结果表明,在不使用任何平行材料的情况下,我们在基线上获得20多个BLEU点的改进,同时以比BEU小的改进率超过27个基点。

0

相关内容

BLEU

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【Nils Reimers】神经搜索的无监督域自适应，Unsupervised domain adaptation for neural search

【Nils Reimers】神经搜索的无监督域自适应，Unsupervised domain adaptation for neural search

专知会员服务

10+阅读 · 2022年3月8日

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

23+阅读 · 2020年4月21日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

专知会员服务

78+阅读 · 2020年2月25日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

【RecSys 2019报告】基于对话的推荐（Context Adaptation with Session‐based Recommenders）

【RecSys 2019报告】基于对话的推荐（Context Adaptation with Session‐based Recommenders）

专知会员服务

33+阅读 · 2019年9月20日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新六篇主题模型相关论文—动态主题模型、主题趋势、大规模并行采样、随机采样、非参主题建模

【论文推荐】最新六篇主题模型相关论文—动态主题模型、主题趋势、大规模并行采样、随机采样、非参主题建模

专知

14+阅读 · 2018年6月24日

rs4969170GG基因型抑制SOCS3基因转录活性促进肝癌发生发展的功能机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

Syndecan-3调节猪采食的作用机制及其营养调控

国家自然科学基金

1+阅读 · 2014年12月31日

miR-29a调控PTEN-Akt/Wnt-β-catenin通路促进轴突伸长和神经干细胞增殖修复脊髓损伤的机制

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

β2-AR/PKA通路在内皮祖细胞修复急性肾损伤中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

肝癌细胞上皮间质转化过程中Snai1介导的染色质长程作用与转录抑制

国家自然科学基金

0+阅读 · 2013年12月31日

牛磺酸对PUMA介导缺血再灌注心肌细胞凋亡的抑制作用

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

CGRP受体重构调控血管平滑肌细胞VPO-1表达的信号跨膜转导机制

国家自然科学基金

0+阅读 · 2011年12月31日

孕烷X受体介导的CYP3A4基因转录调控的表观遗传分子机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Cross-Region Domain Adaptation for Class-level Alignment

Arxiv

0+阅读 · 2022年10月6日

Joint Attention-Driven Domain Fusion and Noise-Tolerant Learning for Multi-Source Domain Adaptation

Arxiv

0+阅读 · 2022年10月6日

Granularity-aware Adaptation for Image Retrieval over Multiple Tasks

Arxiv

0+阅读 · 2022年10月5日

Exploring Adversarially Robust Training for Unsupervised Domain Adaptation

Arxiv

0+阅读 · 2022年10月4日

A Multi Camera Unsupervised Domain Adaptation Pipeline for Object Detection in Cultural Sites through Adversarial Learning and Self-Training

Arxiv

0+阅读 · 2022年10月3日

Language-Family Adapters for Multilingual Neural Machine Translation

Arxiv

0+阅读 · 2022年9月30日

Multi-Prompt Alignment for Multi-source Unsupervised Domain Adaptation

Arxiv

1+阅读 · 2022年9月30日

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

Arxiv

14+阅读 · 2021年4月27日

Adaptive Methods for Real-World Domain Generalization

Arxiv

13+阅读 · 2021年3月29日

A Survey of Domain Adaptation for Neural Machine Translation

Arxiv

18+阅读 · 2018年6月1日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【Nils Reimers】神经搜索的无监督域自适应，Unsupervised domain adaptation for neural search

【Nils Reimers】神经搜索的无监督域自适应，Unsupervised domain adaptation for neural search

专知会员服务

10+阅读 · 2022年3月8日

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

23+阅读 · 2020年4月21日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

专知会员服务

78+阅读 · 2020年2月25日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

【RecSys 2019报告】基于对话的推荐（Context Adaptation with Session‐based Recommenders）

【RecSys 2019报告】基于对话的推荐（Context Adaptation with Session‐based Recommenders）

专知会员服务

33+阅读 · 2019年9月20日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型智能体强化学习：全景综述

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新六篇主题模型相关论文—动态主题模型、主题趋势、大规模并行采样、随机采样、非参主题建模

【论文推荐】最新六篇主题模型相关论文—动态主题模型、主题趋势、大规模并行采样、随机采样、非参主题建模

专知

14+阅读 · 2018年6月24日

相关论文

Cross-Region Domain Adaptation for Class-level Alignment

Arxiv

0+阅读 · 2022年10月6日

Joint Attention-Driven Domain Fusion and Noise-Tolerant Learning for Multi-Source Domain Adaptation

Arxiv

0+阅读 · 2022年10月6日

Granularity-aware Adaptation for Image Retrieval over Multiple Tasks

Arxiv

0+阅读 · 2022年10月5日

Exploring Adversarially Robust Training for Unsupervised Domain Adaptation

Arxiv

0+阅读 · 2022年10月4日

A Multi Camera Unsupervised Domain Adaptation Pipeline for Object Detection in Cultural Sites through Adversarial Learning and Self-Training

Arxiv

0+阅读 · 2022年10月3日

Language-Family Adapters for Multilingual Neural Machine Translation

Arxiv

0+阅读 · 2022年9月30日

Multi-Prompt Alignment for Multi-source Unsupervised Domain Adaptation

Arxiv

1+阅读 · 2022年9月30日

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

Arxiv

14+阅读 · 2021年4月27日

Adaptive Methods for Real-World Domain Generalization

Arxiv

13+阅读 · 2021年3月29日

A Survey of Domain Adaptation for Neural Machine Translation

Arxiv

18+阅读 · 2018年6月1日

相关基金

rs4969170GG基因型抑制SOCS3基因转录活性促进肝癌发生发展的功能机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

Syndecan-3调节猪采食的作用机制及其营养调控

国家自然科学基金

1+阅读 · 2014年12月31日

miR-29a调控PTEN-Akt/Wnt-β-catenin通路促进轴突伸长和神经干细胞增殖修复脊髓损伤的机制

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

β2-AR/PKA通路在内皮祖细胞修复急性肾损伤中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

肝癌细胞上皮间质转化过程中Snai1介导的染色质长程作用与转录抑制

国家自然科学基金

0+阅读 · 2013年12月31日

牛磺酸对PUMA介导缺血再灌注心肌细胞凋亡的抑制作用

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

CGRP受体重构调控血管平滑肌细胞VPO-1表达的信号跨膜转导机制

国家自然科学基金

0+阅读 · 2011年12月31日

孕烷X受体介导的CYP3A4基因转录调控的表观遗传分子机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员