Podcast 脚本中自动语音识别错误的模型强度模型 (Topic Model Robustness to Automatic Speech Recognition Errors in Podcast Transcripts) - 专知论文

会员服务 ·

0

话题模型 · 自动语音识别 · 语音识别 · 余弦相似度 · 稳健性 ·

2021 年 9 月 25 日

Topic Model Robustness to Automatic Speech Recognition Errors in Podcast Transcripts

翻译：Podcast 脚本中自动语音识别错误的模型强度模型

Raluca Alexandra Fetic,Mikkel Jordahn,Lucas Chaves Lima,Rasmus Arpe Fogh Egebæk,Martin Carsten Nielsen,Benjamin Biering,Lars Kai Hansen

For a multilingual podcast streaming service, it is critical to be able to deliver relevant content to all users independent of language. Podcast content relevance is conventionally determined using various metadata sources. However, with the increasing quality of speech recognition in many languages, utilizing automatic transcriptions to provide better content recommendations becomes possible. In this work, we explore the robustness of a Latent Dirichlet Allocation topic model when applied to transcripts created by an automatic speech recognition engine. Specifically, we explore how increasing transcription noise influences topics obtained from transcriptions in Danish; a low resource language. First, we observe a baseline of cosine similarity scores between topic embeddings from automatic transcriptions and the descriptions of the podcasts written by the podcast creators. We then observe how the cosine similarities decrease as transcription noise increases and conclude that even when automatic speech recognition transcripts are erroneous, it is still possible to obtain high-quality topic embeddings from the transcriptions.

翻译：对于多语种播客流流传服务而言,关键是要能够向独立于语言的所有用户提供相关内容。播客内容的相关性通常由各种元数据来源确定。然而,随着多种语言语音识别质量的提高,利用自动抄录提供更好的内容建议成为可能。在这项工作中,我们探索了在应用到自动语音识别引擎创建的记录誊本时, " 冷淡迪里赫莱特分配 " 专题模型的稳健性。具体地说,我们探索了越来越多的抄录噪音如何影响丹麦文抄录中的专题;一种低资源语言。首先,我们观察了从自动抄录中嵌入的专题和播客创作者撰写的播客描述之间的共性相似性分数基线。然后我们观察了如何随着调录噪音的增加而减少共弦相似性,并得出结论,即使自动语音识别记录誊本是错误的,仍然有可能从抄录中获得高质量的专题嵌入。

0

相关内容

话题模型

【CVPR2021】背景鲁棒的自监督视频表征学习

【CVPR2021】背景鲁棒的自监督视频表征学习

专知会员服务

17+阅读 · 2021年3月13日

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

专知会员服务

26+阅读 · 2020年5月6日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【CVPR 2019 | tutorial】野外家庭的视觉识别： Visual Recognition of Families In the Wild

【CVPR 2019 | tutorial】野外家庭的视觉识别： Visual Recognition of Families In the Wild

专知会员服务

10+阅读 · 2019年11月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

已删除

将门创投

5+阅读 · 2018年3月21日

Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses

Arxiv

0+阅读 · 2021年11月16日

Neural Dubber: Dubbing for Videos According to Scripts

Arxiv

0+阅读 · 2021年11月16日

A Bayesian Approach for Characterizing and Mitigating Gate and Measurement Errors

Arxiv

0+阅读 · 2021年11月15日

Template-Based Named Entity Recognition Using BART

Arxiv

5+阅读 · 2021年6月3日

Visual Grounding in Video for Unsupervised Word Translation

Visual Grounding in Video for Unsupervised Word Translation

Arxiv

7+阅读 · 2020年3月11日

Advances in Online Audio-Visual Meeting Transcription

Advances in Online Audio-Visual Meeting Transcription

Arxiv

4+阅读 · 2019年12月10日

Fine-grained robust prosody transfer for single-speaker neural text-to-speech

Fine-grained robust prosody transfer for single-speaker neural text-to-speech

Arxiv

5+阅读 · 2019年7月4日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

Arxiv

18+阅读 · 2018年1月5日

Mitigating the Impact of Speech Recognition Errors on Chatbot using Sequence-to-Sequence Model

Arxiv

4+阅读 · 2017年12月2日

VIP会员

文章信息

相关主题

自动语音识别

余弦相似度

相关VIP内容

【CVPR2021】背景鲁棒的自监督视频表征学习

【CVPR2021】背景鲁棒的自监督视频表征学习

专知会员服务

17+阅读 · 2021年3月13日

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

专知会员服务

26+阅读 · 2020年5月6日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【CVPR 2019 | tutorial】野外家庭的视觉识别： Visual Recognition of Families In the Wild

【CVPR 2019 | tutorial】野外家庭的视觉识别： Visual Recognition of Families In the Wild

专知会员服务

10+阅读 · 2019年11月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津大学博士论文】将序列结构与几何结构融入深度神经网络

工程视角：影响战争进程的小型无人机

企业级AI应用开发：从技术选型到生产落地

AI生成代码缺陷综述

相关资讯

已删除

将门创投

5+阅读 · 2018年3月21日

相关论文

Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses

Arxiv

0+阅读 · 2021年11月16日

Neural Dubber: Dubbing for Videos According to Scripts

Arxiv

0+阅读 · 2021年11月16日

A Bayesian Approach for Characterizing and Mitigating Gate and Measurement Errors

Arxiv

0+阅读 · 2021年11月15日

Template-Based Named Entity Recognition Using BART

Arxiv

5+阅读 · 2021年6月3日

Visual Grounding in Video for Unsupervised Word Translation

Visual Grounding in Video for Unsupervised Word Translation

Arxiv

7+阅读 · 2020年3月11日

Advances in Online Audio-Visual Meeting Transcription

Advances in Online Audio-Visual Meeting Transcription

Arxiv

4+阅读 · 2019年12月10日

Fine-grained robust prosody transfer for single-speaker neural text-to-speech

Fine-grained robust prosody transfer for single-speaker neural text-to-speech

Arxiv

5+阅读 · 2019年7月4日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

Arxiv

18+阅读 · 2018年1月5日

Mitigating the Impact of Speech Recognition Errors on Chatbot using Sequence-to-Sequence Model

Arxiv

4+阅读 · 2017年12月2日

微信扫码咨询专知VIP会员