分析不同语文的单词包含稳定稳定的波动性 (Analyzing the Surprising Variability in Word Embedding Stability Across Languages) - 专知论文

会员服务 ·

0

词向量表示 · 相关系数 · 近邻 · Processing（编程语言） · 多样性 ·

2021 年 9 月 9 日

Analyzing the Surprising Variability in Word Embedding Stability Across Languages

翻译：分析不同语文的单词包含稳定稳定的波动性

Laura Burdick,Jonathan K. Kummerfeld,Rada Mihalcea

from arxiv, Accepted to EMNLP 2021

Word embeddings are powerful representations that form the foundation of many natural language processing architectures, both in English and in other languages. To gain further insight into word embeddings, we explore their stability (e.g., overlap between the nearest neighbors of a word in different embedding spaces) in diverse languages. We discuss linguistic properties that are related to stability, drawing out insights about correlations with affixing, language gender systems, and other features. This has implications for embedding use, particularly in research that uses them to study language trends.

翻译：语言嵌入是许多自然语言处理结构的基础, 包括英语和其他语言。为了深入了解语言嵌入, 我们探索语言的稳定性( 例如, 不同嵌入空间中一个词的近邻之间重叠 ) 。我们讨论与稳定相关的语言特性, 探讨与固定、语言性别系统和其他特征的关联性。这对嵌入使用有影响, 尤其是用于研究语言趋势的研究中。

0

相关内容

词向量表示

词向量表示

分散式表示即将语言表示为稠密、低维、连续的向量。研究者最早发现学习得到词嵌入之间存在类比关系。比如apple−apples ≈ car−cars， man−woman ≈ king – queen 等。这些方法都可以直接在大规模无标注语料上进行训练。词嵌入的质量也非常依赖于上下文窗口大小的选择。通常大的上下文窗口学到的词嵌入更反映主题信息，而小的上下文窗口学到的词嵌入更反映词的功能和上下文语义信息。

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

【新书】贝叶斯网络进展与新应用，附全书下载

【新书】贝叶斯网络进展与新应用，附全书下载

专知会员服务

122+阅读 · 2019年12月9日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

【多语言模型跨任务嵌入投影】Multilingual model using cross-task embedding projection，CoNLL 2019，附论文和PPT免费下载

【多语言模型跨任务嵌入投影】Multilingual model using cross-task embedding projection，CoNLL 2019，附论文和PPT免费下载

专知会员服务

10+阅读 · 2019年11月4日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

岗位推荐 | 阿里巴巴达摩院招聘自然语言处理、机器翻译算法专家

岗位推荐 | 阿里巴巴达摩院招聘自然语言处理、机器翻译算法专家

PaperWeekly

4+阅读 · 2018年10月13日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Individual causal effects from observational longitudinal studies with time-varying exposures

Arxiv

0+阅读 · 2021年12月8日

Does Summary Evaluation Survive Translation to Other Languages?

Arxiv

0+阅读 · 2021年12月8日

The Stability and Accuracy Tradeoff Under Dataset Shift: A Causal Graphical Analysis

Arxiv

0+阅读 · 2021年12月4日

Visually Grounded Reasoning across Languages and Cultures

Arxiv

3+阅读 · 2021年10月21日

Unsupervised Multilingual Word Embeddings

Arxiv

3+阅读 · 2018年8月27日

Characterizing Departures from Linearity in Word Translation

Arxiv

3+阅读 · 2018年6月7日

Deep contextualized word representations

Arxiv

10+阅读 · 2018年3月22日

Baselines and test data for cross-lingual inference

Arxiv

3+阅读 · 2018年3月2日

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

Arxiv

3+阅读 · 2018年1月19日

Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding

Arxiv

3+阅读 · 2017年9月26日

VIP会员

文章信息

相关主题

词向量表示

Processing（编程语言）

相关VIP内容

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

【新书】贝叶斯网络进展与新应用，附全书下载

【新书】贝叶斯网络进展与新应用，附全书下载

专知会员服务

122+阅读 · 2019年12月9日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

【多语言模型跨任务嵌入投影】Multilingual model using cross-task embedding projection，CoNLL 2019，附论文和PPT免费下载

【多语言模型跨任务嵌入投影】Multilingual model using cross-task embedding projection，CoNLL 2019，附论文和PPT免费下载

专知会员服务

10+阅读 · 2019年11月4日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

仿生机器人技术的军事应用

《反集群作战：基于深度学习的分布式决策方法》89页

机器人领域中最佳的三维场景表示是什么？——从几何表示到基础模型

《多域作战兵棋推演：运用形态学分析与人工智能加强国防人员训练》

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

岗位推荐 | 阿里巴巴达摩院招聘自然语言处理、机器翻译算法专家

岗位推荐 | 阿里巴巴达摩院招聘自然语言处理、机器翻译算法专家

PaperWeekly

4+阅读 · 2018年10月13日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Individual causal effects from observational longitudinal studies with time-varying exposures

Arxiv

0+阅读 · 2021年12月8日

Does Summary Evaluation Survive Translation to Other Languages?

Arxiv

0+阅读 · 2021年12月8日

The Stability and Accuracy Tradeoff Under Dataset Shift: A Causal Graphical Analysis

Arxiv

0+阅读 · 2021年12月4日

Visually Grounded Reasoning across Languages and Cultures

Arxiv

3+阅读 · 2021年10月21日

Unsupervised Multilingual Word Embeddings

Arxiv

3+阅读 · 2018年8月27日

Characterizing Departures from Linearity in Word Translation

Arxiv

3+阅读 · 2018年6月7日

Deep contextualized word representations

Arxiv

10+阅读 · 2018年3月22日

Baselines and test data for cross-lingual inference

Arxiv

3+阅读 · 2018年3月2日

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

Arxiv

3+阅读 · 2018年1月19日

Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding

Arxiv

3+阅读 · 2017年9月26日

微信扫码咨询专知VIP会员