使用背景词嵌入字词对语义变化进行统计学上的重要探测 (Statistically significant detection of semantic shifts using contextual word embeddings) - 专知论文

会员服务 ·

0

统计量 · 词向量表示 · 估计/估计量 · Performer · 假阳性 ·

2021 年 9 月 24 日

Statistically significant detection of semantic shifts using contextual word embeddings

翻译：使用背景词嵌入字词对语义变化进行统计学上的重要探测

Yang Liu,Alan Medlar,Dorota Glowacka

Detecting lexical semantic change in smaller data sets, e.g. in historical linguistics and digital humanities, is challenging due to a lack of statistical power. This issue is exacerbated by non-contextual embedding models that produce one embedding per word and, therefore, mask the variability present in the data. In this article, we propose an approach to estimate semantic shift by combining contextual word embeddings with permutation-based statistical tests. We use the false discovery rate procedure to address the large number of hypothesis tests being conducted simultaneously. We demonstrate the performance of this approach in simulation where it achieves consistently high precision by suppressing false positives. We additionally analyze real-world data from SemEval-2020 Task 1 and the Liverpool FC subreddit corpus. We show that by taking sample variation into account, we can improve the robustness of individual semantic shift estimates without degrading overall performance.

翻译：由于缺乏统计力量,检测历史语言学和数字人文学等较小数据集的语义变化具有挑战性,因为缺乏统计力量,这一问题因非理论嵌入模型而加剧,这些模型产生一个单字嵌入,因此掩盖了数据中存在的变异性。在本篇文章中,我们提出一种方法,通过将上下文词嵌入与基于变异的统计测试相结合来估计语义变化。我们使用假发现率程序来解决同时进行的大量假设测试。我们在模拟中展示了这一方法的性能,通过抑制假阳性实现一贯的高度精确性。我们进一步分析了SemEval-2020任务1和Liverpool FC子编集的真实世界数据。我们表明,通过将样本变化考虑在内,我们可以在不降低总体性能的情况下提高个体语义转移估计数的稳健性。

0

相关内容

统计量

【UMASS博士论文】几何表示学习，162页pdf

专知会员服务

65+阅读 · 2021年4月11日

重磅！斯坦福「人工智能指数报告2021」出炉，222页pdf详述九大进展

重磅！斯坦福「人工智能指数报告2021」出炉，222页pdf详述九大进展

专知会员服务

63+阅读 · 2021年3月4日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【实用书】数据科学基础，484页pdf，Foundations of Data Science

【实用书】数据科学基础，484页pdf，Foundations of Data Science

专知会员服务

122+阅读 · 2020年5月28日

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

专知会员服务

76+阅读 · 2020年4月10日

【电子书】大数据挖掘，Mining of Massive Datasets，附513页PDF

【电子书】大数据挖掘，Mining of Massive Datasets，附513页PDF

专知会员服务

105+阅读 · 2020年3月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

已删除

将门创投

3+阅读 · 2019年6月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Arxiv

0+阅读 · 2021年11月16日

Channel Estimation and Data Detection Analysis of Massive MIMO with 1-Bit ADCs

Arxiv

0+阅读 · 2021年11月16日

Accurate Fine-grained Layout Analysis for the Historical Tibetan Document Based on the Instance Segmentation

Arxiv

1+阅读 · 2021年11月15日

A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification

Arxiv

6+阅读 · 2021年4月1日

Object-Contextual Representations for Semantic Segmentation

Object-Contextual Representations for Semantic Segmentation

Arxiv

3+阅读 · 2019年9月24日

A Unified Model for Joint Chinese Word Segmentation and Dependency Parsing

Arxiv

4+阅读 · 2019年4月9日

Exploring the Semantics for Visual Relationship Detection

Arxiv

3+阅读 · 2019年4月3日

Triply Supervised Decoder Networks for Joint Detection and Segmentation

Arxiv

3+阅读 · 2018年9月25日

Localization Recall Precision (LRP): A New Performance Metric for Object Detection

Localization Recall Precision (LRP): A New Performance Metric for Object Detection

Arxiv

4+阅读 · 2018年7月4日

Zero-Shot Object Detection

Arxiv

9+阅读 · 2018年4月12日

VIP会员

文章信息

相关主题

词向量表示

估计/估计量

相关VIP内容

【UMASS博士论文】几何表示学习，162页pdf

专知会员服务

65+阅读 · 2021年4月11日

重磅！斯坦福「人工智能指数报告2021」出炉，222页pdf详述九大进展

重磅！斯坦福「人工智能指数报告2021」出炉，222页pdf详述九大进展

专知会员服务

63+阅读 · 2021年3月4日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【实用书】数据科学基础，484页pdf，Foundations of Data Science

【实用书】数据科学基础，484页pdf，Foundations of Data Science

专知会员服务

122+阅读 · 2020年5月28日

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

专知会员服务

76+阅读 · 2020年4月10日

【电子书】大数据挖掘，Mining of Massive Datasets，附513页PDF

【电子书】大数据挖掘，Mining of Massive Datasets，附513页PDF

专知会员服务

105+阅读 · 2020年3月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

已删除

将门创投

3+阅读 · 2019年6月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Arxiv

0+阅读 · 2021年11月16日

Channel Estimation and Data Detection Analysis of Massive MIMO with 1-Bit ADCs

Arxiv

0+阅读 · 2021年11月16日

Accurate Fine-grained Layout Analysis for the Historical Tibetan Document Based on the Instance Segmentation

Arxiv

1+阅读 · 2021年11月15日

A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification

Arxiv

6+阅读 · 2021年4月1日

Object-Contextual Representations for Semantic Segmentation

Object-Contextual Representations for Semantic Segmentation

Arxiv

3+阅读 · 2019年9月24日

A Unified Model for Joint Chinese Word Segmentation and Dependency Parsing

Arxiv

4+阅读 · 2019年4月9日

Exploring the Semantics for Visual Relationship Detection

Arxiv

3+阅读 · 2019年4月3日

Triply Supervised Decoder Networks for Joint Detection and Segmentation

Arxiv

3+阅读 · 2018年9月25日

Localization Recall Precision (LRP): A New Performance Metric for Object Detection

Localization Recall Precision (LRP): A New Performance Metric for Object Detection

Arxiv

4+阅读 · 2018年7月4日

Zero-Shot Object Detection

Arxiv

9+阅读 · 2018年4月12日

微信扫码咨询专知VIP会员