RNNNLM 词汇外关键字搜索的近似值 (Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search) - 专知论文

会员服务 ·

0

N元 · 近似 · 特化 · N元模型 · MoDELS ·

2020 年 9 月 10 日

Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search

翻译：RNNNLM 词汇外关键字搜索的近似值

Mittul Singh,Sami Virpioja,Peter Smit,Mikko Kurimo

from arxiv, INTERSPEECH 2019

In spoken Keyword Search, the query may contain out-of-vocabulary (OOV) words not observed when training the speech recognition system. Using subword language models (LMs) in the first-pass recognition makes it possible to recognize the OOV words, but even the subword n-gram LMs suffer from data sparsity. Recurrent Neural Network (RNN) LMs alleviate the sparsity problems but are not suitable for first-pass recognition as such. One way to solve this is to approximate the RNNLMs by back-off n-gram models. In this paper, we propose to interpolate the conventional n-gram models and the RNNLM approximation for better OOV recognition. Furthermore, we develop a new RNNLM approximation method suitable for subword units: It produces variable-order n-grams to include long-span approximations and considers also n-grams that were not originally observed in the training corpus. To evaluate these models on OOVs, we setup Arabic and Finnish Keyword Search tasks concentrating only on OOV words. On these tasks, interpolating the baseline RNNLM approximation and a conventional LM outperforms the conventional LM in terms of the Maximum Term Weighted Value for single-character subwords. Moreover, replacing the baseline approximation with the proposed method achieves the best performance on both multi- and single-character subwords.

翻译：在口语关键字搜索中,查询可能包含在培训语音识别系统时没有观察到的词汇外词。在第一个承认时,使用子字语言模型(LM)使分字语言模型(LM)能够识别OOV字词,但即使是小字 n-g LM 也存在数据宽度。常规神经网络(RNNN)LM LMLMLMLMLMLMLM(RNNLM)LM(LM)LM)LM(LM)LM(RNNLM)LM)LM(LM)(LM(LM))(LM(RNNLM))(LM(RNNLM)(M)(RNNNLM(M))(M(M)(RNNM))(RM(M)(M)(M)(M)(M)(M) (LOOVD(OV)字中,我们设置了这些模型,仅集中评价OV(OV)字的模型。在这些任务中,用最基本的RNNNNM(ODM(OL)(OD)(OD)(OD)(M)(OLOLM)(OD)(OL)(OL)(OD)(OL)(L)和(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(OD)(L)(L)(OD)(OD)(L)(L)(L)(L)(L)(L)(L))(L)(L)(L)(L))(L))))(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)))))(L)(L)(L)(L)(L)(L)(L)(L))(L)(L)(L)(L)))(L)(L)(L)(L)(L)(L)(L)((((((((((

0

相关内容

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

专知会员服务

38+阅读 · 2020年7月3日

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

专知会员服务

21+阅读 · 2020年6月4日

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

专知会员服务

31+阅读 · 2020年5月20日

深度学习自然语言处理概述，216页ppt，Jindřich Helcl

深度学习自然语言处理概述，216页ppt，Jindřich Helcl

专知会员服务

216+阅读 · 2020年4月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

【O'Reilly TensorFlow Conference 2019】恶意软件检测（Generative malware outbreak detection），Sean Park | Trend Micro

【O'Reilly TensorFlow Conference 2019】恶意软件检测（Generative malware outbreak detection），Sean Park | Trend Micro

专知会员服务

15+阅读 · 2019年11月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

已删除

将门创投

7+阅读 · 2017年7月11日

Transformer based Grapheme-to-Phoneme Conversion

Arxiv

6+阅读 · 2020年4月14日

Modeling question asking using neural program generation

Arxiv

4+阅读 · 2019年9月26日

Improving Neural Question Generation using Answer Separation

Improving Neural Question Generation using Answer Separation

Arxiv

3+阅读 · 2018年9月7日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Arxiv

5+阅读 · 2018年6月4日

Graph-based Filtering of Out-of-Vocabulary Words for Encoder-Decoder Models

Arxiv

4+阅读 · 2018年5月28日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

SEARNN: Training RNNs with Global-Local Losses

Arxiv

5+阅读 · 2018年1月29日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

Neural Response Generation with Dynamic Vocabularies

Arxiv

5+阅读 · 2017年11月30日

VIP会员

文章信息

相关主题

相关VIP内容

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

专知会员服务

38+阅读 · 2020年7月3日

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

专知会员服务

21+阅读 · 2020年6月4日

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

专知会员服务

31+阅读 · 2020年5月20日

深度学习自然语言处理概述，216页ppt，Jindřich Helcl

深度学习自然语言处理概述，216页ppt，Jindřich Helcl

专知会员服务

216+阅读 · 2020年4月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

【O'Reilly TensorFlow Conference 2019】恶意软件检测（Generative malware outbreak detection），Sean Park | Trend Micro

【O'Reilly TensorFlow Conference 2019】恶意软件检测（Generative malware outbreak detection），Sean Park | Trend Micro

专知会员服务

15+阅读 · 2019年11月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于大型语言模型的软件工程自动化研究》最新264页

《基于大型语言模型的信号处理管线研究：推进军事电子情报工作流程》最新76页

中文版 | 战争算法：生成式人工智能在战场的崛起

中文版《美国陆军：战术行为性远程医疗实施观察与建议》

相关资讯

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

已删除

将门创投

7+阅读 · 2017年7月11日

相关论文

Transformer based Grapheme-to-Phoneme Conversion

Arxiv

6+阅读 · 2020年4月14日

Modeling question asking using neural program generation

Arxiv

4+阅读 · 2019年9月26日

Improving Neural Question Generation using Answer Separation

Improving Neural Question Generation using Answer Separation

Arxiv

3+阅读 · 2018年9月7日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Arxiv

5+阅读 · 2018年6月4日

Graph-based Filtering of Out-of-Vocabulary Words for Encoder-Decoder Models

Arxiv

4+阅读 · 2018年5月28日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

SEARNN: Training RNNs with Global-Local Losses

Arxiv

5+阅读 · 2018年1月29日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

Neural Response Generation with Dynamic Vocabularies

Arxiv

5+阅读 · 2017年11月30日

微信扫码咨询专知VIP会员