贬低偏见的多语种语言嵌入词:三种印度语言案例研究 (Debiasing Multilingual Word Embeddings: A Case Study of Three Indian Languages) - 专知论文

会员服务 ·

0

state-of-the-art · CASE · 词向量表示 · Performer · 无偏 ·

2021 年 7 月 21 日

Debiasing Multilingual Word Embeddings: A Case Study of Three Indian Languages

翻译：贬低偏见的多语种语言嵌入词:三种印度语言案例研究

Srijan Bansal,Vishal Garimella,Ayush Suhane,Animesh Mukherjee

from arxiv, This work is accepted as a long paper in the proceedings of ACM HyperText 2021

In this paper, we advance the current state-of-the-art method for debiasing monolingual word embeddings so as to generalize well in a multilingual setting. We consider different methods to quantify bias and different debiasing approaches for monolingual as well as multilingual settings. We demonstrate the significance of our bias-mitigation approach on downstream NLP applications. Our proposed methods establish the state-of-the-art performance for debiasing multilingual embeddings for three Indian languages - Hindi, Bengali, and Telugu in addition to English. We believe that our work will open up new opportunities in building unbiased downstream NLP applications that are inherently dependent on the quality of the word embeddings used.

翻译：在本文中,我们推广目前最先进的降低单语语言嵌入率的方法,以便在多语种环境中全面推广。我们考虑了用不同方法量化单一语言和多语种环境中的偏见和不同贬入率方法。我们展示了我们对下游NLP应用的减少偏入率方法的重要性。我们提出的方法建立了降低印度三种语言(印地语、孟加拉语和泰鲁古语)多语嵌入率的最先进的表现。我们认为,我们的工作将开辟新的机会,建设无偏见的下游NLP应用,这些应用本身取决于所用语言嵌入的质量。

0

相关内容

state-of-the-art

state-of-the-art

最近几种小样本元学习简明综述，A Concise Review of Recent Few-shot Meta-learning Methods

最近几种小样本元学习简明综述，A Concise Review of Recent Few-shot Meta-learning Methods

专知会员服务

35+阅读 · 2020年5月25日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

【ACL2020】计算语言学在深度学习领域不可阻挡的崛起，The Unstoppable Rise of Computational Linguistics in Deep Learning

【ACL2020】计算语言学在深度学习领域不可阻挡的崛起，The Unstoppable Rise of Computational Linguistics in Deep Learning

专知会员服务

8+阅读 · 2020年5月15日

机器翻译深度学习最新综述

机器翻译深度学习最新综述

专知会员服务

99+阅读 · 2020年2月20日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

专知会员服务

207+阅读 · 2019年9月30日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

已删除

将门创投

5+阅读 · 2018年7月25日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Role of Language Relatedness in Multilingual Fine-tuning of Language Models: A Case Study in Indo-Aryan Languages

Arxiv

0+阅读 · 2021年9月22日

Augmenting semantic lexicons using word embeddings and transfer learning

Arxiv

0+阅读 · 2021年9月18日

An Alignment-Agnostic Model for Chinese Text Error Correction

Arxiv

0+阅读 · 2021年9月18日

Unsupervised Multilingual Word Embeddings

Arxiv

4+阅读 · 2018年9月6日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

How Do Source-side Monolingual Word Embeddings Impact Neural Machine Translation?

Arxiv

5+阅读 · 2018年6月5日

When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?

Arxiv

3+阅读 · 2018年4月18日

Sentiment Analysis of Code-Mixed Indian Languages: An Overview of SAIL_Code-Mixed Shared Task @ICON-2017

Arxiv

6+阅读 · 2018年3月18日

A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Arxiv

3+阅读 · 2018年2月1日

Word Translation Without Parallel Data

Arxiv

7+阅读 · 2018年1月30日

VIP会员

文章信息

相关主题

state-of-the-art

词向量表示

相关VIP内容

最近几种小样本元学习简明综述，A Concise Review of Recent Few-shot Meta-learning Methods

最近几种小样本元学习简明综述，A Concise Review of Recent Few-shot Meta-learning Methods

专知会员服务

35+阅读 · 2020年5月25日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

【ACL2020】计算语言学在深度学习领域不可阻挡的崛起，The Unstoppable Rise of Computational Linguistics in Deep Learning

【ACL2020】计算语言学在深度学习领域不可阻挡的崛起，The Unstoppable Rise of Computational Linguistics in Deep Learning

专知会员服务

8+阅读 · 2020年5月15日

机器翻译深度学习最新综述

机器翻译深度学习最新综述

专知会员服务

99+阅读 · 2020年2月20日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

专知会员服务

207+阅读 · 2019年9月30日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

已删除

将门创投

5+阅读 · 2018年7月25日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

相关论文

Role of Language Relatedness in Multilingual Fine-tuning of Language Models: A Case Study in Indo-Aryan Languages

Arxiv

0+阅读 · 2021年9月22日

Augmenting semantic lexicons using word embeddings and transfer learning

Arxiv

0+阅读 · 2021年9月18日

An Alignment-Agnostic Model for Chinese Text Error Correction

Arxiv

0+阅读 · 2021年9月18日

Unsupervised Multilingual Word Embeddings

Arxiv

4+阅读 · 2018年9月6日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

How Do Source-side Monolingual Word Embeddings Impact Neural Machine Translation?

Arxiv

5+阅读 · 2018年6月5日

When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?

Arxiv

3+阅读 · 2018年4月18日

Sentiment Analysis of Code-Mixed Indian Languages: An Overview of SAIL_Code-Mixed Shared Task @ICON-2017

Arxiv

6+阅读 · 2018年3月18日

A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Arxiv

3+阅读 · 2018年2月1日

Word Translation Without Parallel Data

Arxiv

7+阅读 · 2018年1月30日

微信扫码咨询专知VIP会员