使用 Kaldi 自动语音识别奥地利德文 (Using Kaldi for Automatic Speech Recognition of Conversational Austrian German) - 专知论文

会员服务 ·

0

语音识别 · 任务对话系统 · 自动语音识别 · INTERACT · Performer ·

2023 年 1 月 16 日

Using Kaldi for Automatic Speech Recognition of Conversational Austrian German

翻译：使用 Kaldi 自动语音识别奥地利德文

Julian Linke,Saskia Wepner,Gernot Kubin,Barbara Schuppler

from arxiv, 10 pages, 2 figures, 4 tables

As dialogue systems are becoming more and more interactional and social, also the accurate automatic speech recognition (ASR) of conversational speech is of increasing importance. This shifts the focus from short, spontaneous, task-oriented dialogues to the much higher complexity of casual face-to-face conversations. However, the collection and annotation of such conversations is a time-consuming process and data is sparse for this specific speaking style. This paper presents ASR experiments with read and conversational Austrian German as target. In order to deal with having only limited resources available for conversational German and, at the same time, with a large variation among speakers with respect to pronunciation characteristics, we improve a Kaldi-based ASR system by incorporating a (large) knowledge-based pronunciation lexicon, while exploring different data-based methods to restrict the number of pronunciation variants for each lexical entry. We achieve best WER of 0.4% on Austrian German read speech and best average WER of 48.5% on conversational speech. We find that by using our best pronunciation lexicon a similarly high performance can be achieved than by increasing the size of the data used for the language model by approx. 360% to 760%. Our findings indicate that for low-resource scenarios -- despite the general trend in speech technology towards using data-based methods only -- knowledge-based approaches are a successful, efficient method.

翻译：随着对话系统日益变得越来越互动和社交性,对谈话性演讲的准确自动语音识别(ASR)也越来越重要。这把重点从简短的、自发的、面向任务的对话转向更复杂的临时面对面对话。然而,这种对话的收集和批注是一个耗时的过程,对于这种具体的演讲风格来说,数据很少。本文介绍了ASR实验,以阅读和对话的奥地利德语作为目标。为了处理可用于对话用德语的资源有限的问题,同时,在发音特点方面,发言者之间差异很大,我们改进了Kaldi基于任务的ASR系统,纳入了(大)基于知识的读音词汇,同时探索了不同基于数据的方法来限制每个词汇条目的读音变体的数量。我们实现了奥地利德语读音为0.4%的最好WER,在谈话性演讲中,48.5%以平均WER为基础。我们发现,通过使用我们最好的读音法化语言的最好方法,只有类似的高性表现才能实现 -- -- 尽管我们使用的语音分析方法使用了60%的低比例方法,但是使用了我们所使用的语音分析方法中所使用的数据趋势也表明,只有使用了一种低比例的方法。

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【ACM Multimedia2021-tutorial】可信赖多媒体分析

【ACM Multimedia2021-tutorial】可信赖多媒体分析

专知会员服务

18+阅读 · 2021年10月20日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

Klf4/MSI2信号通路在胰腺癌神经浸润中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Decorin对急性缺血性卒中后血脑屏障中ZO-1蛋白的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

miR-146a靶向IRAK1与TRAF6调控非小细胞肺癌转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

癌/睾丸抗原HCA587对转录因子NF-κB的调节作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

能量临界情形的非线性Schrodinger方程

国家自然科学基金

0+阅读 · 2011年12月31日

Dyrk1A调控CaMKⅡ#948;的可变剪接及其在心脏重构过程中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

针刺抗氧化效应的TRx氧化还原调控机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

Rewarding Chatbots for Real-World Engagement with Millions of Users

Arxiv

0+阅读 · 2023年3月10日

On the Fusion Strategies for Federated Decision Making

On the Fusion Strategies for Federated Decision Making

Arxiv

0+阅读 · 2023年3月10日

Locally Regularized Neural Differential Equations: Some Black Boxes Were Meant to Remain Closed!

Arxiv

0+阅读 · 2023年3月10日

ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting

Arxiv

0+阅读 · 2023年3月9日

Using Affine Combinations of BBOB Problems for Performance Assessment

Arxiv

0+阅读 · 2023年3月8日

Loss-Curvature Matching for Dataset Selection and Condensation

Arxiv

0+阅读 · 2023年3月8日

wav2vec and its current potential to Automatic Speech Recognition in German for the usage in Digital History: A comparative assessment of available ASR-technologies for the use in cultural heritage contexts

Arxiv

0+阅读 · 2023年3月6日

Introduction to Online Convex Optimization

Arxiv

23+阅读 · 2021年12月19日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

VIP会员

文章信息

相关主题

任务对话系统

自动语音识别

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【ACM Multimedia2021-tutorial】可信赖多媒体分析

【ACM Multimedia2021-tutorial】可信赖多媒体分析

专知会员服务

18+阅读 · 2021年10月20日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

赋能真实世界：基于大语言模型的产业智能体技术、实践与评测综述

军事行动中人工智能系统目标交战的附带损伤评估模型 | 最新文献

【普林斯顿博士论文】面向人本机器人学的安全与学习博弈论融合

美陆军协会（AUSA）2025 年会公布的美国十大武器与防务产品创新

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Rewarding Chatbots for Real-World Engagement with Millions of Users

Arxiv

0+阅读 · 2023年3月10日

On the Fusion Strategies for Federated Decision Making

On the Fusion Strategies for Federated Decision Making

Arxiv

0+阅读 · 2023年3月10日

Locally Regularized Neural Differential Equations: Some Black Boxes Were Meant to Remain Closed!

Arxiv

0+阅读 · 2023年3月10日

ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting

Arxiv

0+阅读 · 2023年3月9日

Using Affine Combinations of BBOB Problems for Performance Assessment

Arxiv

0+阅读 · 2023年3月8日

Loss-Curvature Matching for Dataset Selection and Condensation

Arxiv

0+阅读 · 2023年3月8日

wav2vec and its current potential to Automatic Speech Recognition in German for the usage in Digital History: A comparative assessment of available ASR-technologies for the use in cultural heritage contexts

Arxiv

0+阅读 · 2023年3月6日

Introduction to Online Convex Optimization

Arxiv

23+阅读 · 2021年12月19日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

相关基金

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

Klf4/MSI2信号通路在胰腺癌神经浸润中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Decorin对急性缺血性卒中后血脑屏障中ZO-1蛋白的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

miR-146a靶向IRAK1与TRAF6调控非小细胞肺癌转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

癌/睾丸抗原HCA587对转录因子NF-κB的调节作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

能量临界情形的非线性Schrodinger方程

国家自然科学基金

0+阅读 · 2011年12月31日

Dyrk1A调控CaMKⅡ#948;的可变剪接及其在心脏重构过程中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

针刺抗氧化效应的TRx氧化还原调控机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员