自然语言处理中无监督的单词分割：使用时间梯度伪标签的方法 (Unsupervised Word Segmentation Using Temporal Gradient Pseudo-Labels) - 专知论文

会员服务 ·

0

分割 · 无监督 · 监督 · 梯度 · 嵌入 ·

2023 年 3 月 30 日

Unsupervised Word Segmentation Using Temporal Gradient Pseudo-Labels

翻译：自然语言处理中无监督的单词分割：使用时间梯度伪标签的方法

Tzeviya Sylvia Fuchs,Yedid Hoshen

from arxiv, ICASSP 2023

Unsupervised word segmentation in audio utterances is challenging as, in speech, there is typically no gap between words. In a preliminary experiment, we show that recent deep self-supervised features are very effective for word segmentation but require supervision for training the classification head. To extend their effectiveness to unsupervised word segmentation, we propose a pseudo-labeling strategy. Our approach relies on the observation that the temporal gradient magnitude of the embeddings (i.e. the distance between the embeddings of subsequent frames) is typically minimal far from the boundaries and higher nearer the boundaries. We use a thresholding function on the temporal gradient magnitude to define a psuedo-label for wordness. We train a linear classifier, mapping the embedding of a single frame to the pseudo-label. Finally, we use the classifier score to predict whether a frame is a word or a boundary. In an empirical investigation, our method, despite its simplicity and fast run time, is shown to significantly outperform all previous methods on two datasets.

翻译：在语音中进行无监督单词分割是具有挑战性的，因为在口语中通常单词之间没有停顿。在初步的实验中，我们表明，最近的深度自监督特征对于单词分割非常有效，但需要监督以训练分类头。我们提出一种伪标签策略，将它们的有效性扩展到无监督单词分割。我们的方法依赖于一种观察方式，即嵌入的时间梯度大小（即连续帧嵌入之间的距离）通常在边界附近较高，在边界远离较小。我们对时间梯度大小使用一个阈值函数来定义“单词”的伪标签。我们训练一个线性分类器，将单帧的嵌入映射到伪标签。最后，我们使用分类器得分来预测帧是单词还是界限。在实证研究中，我们的方法尽管简单且运行时间较快，但被证明在两个数据集上显着优于所有先前方法。

1

相关内容

基于几何结构预训练的蛋白质表征学习

基于几何结构预训练的蛋白质表征学习

专知会员服务

15+阅读 · 2022年8月21日

【AAAI2022】基于协调域编码器和配对分类器的多源域适应

【AAAI2022】基于协调域编码器和配对分类器的多源域适应

专知会员服务

17+阅读 · 2022年2月9日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【Contextual Embedding】什么时候上下文嵌入值得使用?

【Contextual Embedding】什么时候上下文嵌入值得使用?

专知会员服务

16+阅读 · 2020年8月2日

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

91+阅读 · 2020年7月4日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

专知会员服务

22+阅读 · 2020年4月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

专知会员服务

40+阅读 · 2019年11月23日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

专知

13+阅读 · 2018年1月23日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

自然语言处理 (三)　之　word embedding

自然语言处理 (三)　之　word embedding

DeepLearning中文论坛

19+阅读 · 2015年8月3日

振荡哈密尔顿波方程的几何数值积分

国家自然科学基金

0+阅读 · 2015年12月31日

SHVC质量可伸缩视频编码的快速算法研究

国家自然科学基金

1+阅读 · 2014年12月31日

机器翻译中大规模异类特征的迁移学习

国家自然科学基金

2+阅读 · 2013年12月31日

结构化矢量图的模式样本合成与操控

国家自然科学基金

0+阅读 · 2013年12月31日

基于局部不变特征和混合多示例学习的图像检索研究

国家自然科学基金

1+阅读 · 2013年12月31日

稳健且有效的回归和变量选择方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

用于交互式视频检索的教练式主动学习模型

国家自然科学基金

0+阅读 · 2012年12月31日

基于空间位置编码的时间知觉的研究

国家自然科学基金

0+阅读 · 2011年12月31日

视觉识别中类别信息早期加工的认知神经机制

国家自然科学基金

0+阅读 · 2009年12月31日

高分辨率极化SAR图像场景分割与标注算法研究

国家自然科学基金

0+阅读 · 2008年12月31日

ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings

ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings

Arxiv

0+阅读 · 2023年5月23日

Temporal Contrastive Learning for Spiking Neural Networks

Arxiv

0+阅读 · 2023年5月23日

READMem: Robust Embedding Association for a Diverse Memory in Unconstrained Video Object Segmentation

Arxiv

0+阅读 · 2023年5月22日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

Arxiv

13+阅读 · 2020年7月3日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

VIP会员

文章信息

相关主题

相关VIP内容

基于几何结构预训练的蛋白质表征学习

基于几何结构预训练的蛋白质表征学习

专知会员服务

15+阅读 · 2022年8月21日

【AAAI2022】基于协调域编码器和配对分类器的多源域适应

【AAAI2022】基于协调域编码器和配对分类器的多源域适应

专知会员服务

17+阅读 · 2022年2月9日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【Contextual Embedding】什么时候上下文嵌入值得使用?

【Contextual Embedding】什么时候上下文嵌入值得使用?

专知会员服务

16+阅读 · 2020年8月2日

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

91+阅读 · 2020年7月4日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

专知会员服务

22+阅读 · 2020年4月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

专知会员服务

40+阅读 · 2019年11月23日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

专知

13+阅读 · 2018年1月23日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

自然语言处理 (三)　之　word embedding

自然语言处理 (三)　之　word embedding

DeepLearning中文论坛

19+阅读 · 2015年8月3日

相关论文

ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings

ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings

Arxiv

0+阅读 · 2023年5月23日

Temporal Contrastive Learning for Spiking Neural Networks

Arxiv

0+阅读 · 2023年5月23日

READMem: Robust Embedding Association for a Diverse Memory in Unconstrained Video Object Segmentation

Arxiv

0+阅读 · 2023年5月22日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

Arxiv

13+阅读 · 2020年7月3日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

相关基金

振荡哈密尔顿波方程的几何数值积分

国家自然科学基金

0+阅读 · 2015年12月31日

SHVC质量可伸缩视频编码的快速算法研究

国家自然科学基金

1+阅读 · 2014年12月31日

机器翻译中大规模异类特征的迁移学习

国家自然科学基金

2+阅读 · 2013年12月31日

结构化矢量图的模式样本合成与操控

国家自然科学基金

0+阅读 · 2013年12月31日

基于局部不变特征和混合多示例学习的图像检索研究

国家自然科学基金

1+阅读 · 2013年12月31日

稳健且有效的回归和变量选择方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

用于交互式视频检索的教练式主动学习模型

国家自然科学基金

0+阅读 · 2012年12月31日

基于空间位置编码的时间知觉的研究

国家自然科学基金

0+阅读 · 2011年12月31日

视觉识别中类别信息早期加工的认知神经机制

国家自然科学基金

0+阅读 · 2009年12月31日

高分辨率极化SAR图像场景分割与标注算法研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员