利用LSTM-RNN的注意机制,进行上下对下议长的高度和年龄估计 (End-to-End Speaker Height and age estimation using Attention Mechanism with LSTM-RNN) - 专知论文

会员服务 ·

0

估计/估计量 · 注意力机制 · 上下文向量 · 端到端 · 向量化 ·

2021 年 1 月 13 日

End-to-End Speaker Height and age estimation using Attention Mechanism with LSTM-RNN

翻译：利用LSTM-RNN的注意机制,进行上下对下议长的高度和年龄估计

Manav Kaushik,Van Tung Pham,Eng Siong Chng

from arxiv, 5 Pages

Automatic height and age estimation of speakers using acoustic features is widely used for the purpose of human-computer interaction, forensics, etc. In this work, we propose a novel approach of using attention mechanism to build an end-to-end architecture for height and age estimation. The attention mechanism is combined with Long Short-Term Memory(LSTM) encoder which is able to capture long-term dependencies in the input acoustic features. We modify the conventionally used Attention -- which calculates context vectors the sum of attention only across timeframes -- by introducing a modified context vector which takes into account total attention across encoder units as well, giving us a new cross-attention mechanism. Apart from this, we also investigate a multi-task learning approach for jointly estimating speaker height and age. We train and test our model on the TIMIT corpus. Our model outperforms several approaches in the literature. We achieve a root mean square error (RMSE) of 6.92cm and6.34cm for male and female heights respectively and RMSE of 7.85years and 8.75years for male and females ages respectively. By tracking the attention weights allocated to different phones, we find that Vowel phones are most important whistlestop phones are least important for the estimation task.

翻译：使用声学特征的发言者的自动高度和年龄估计被广泛用于人体-计算机互动、法证等目的。在这项工作中,我们提出一种新颖的注意机制,即利用关注机制来建立一个用于估计身高和年龄的端到端结构。注意机制与长期短期内存编码器相结合,能够捕捉输入声学特征的长期依赖性。我们修改传统使用的注意方法 -- -- 计算背景矢量,仅在时间跨时间跨时间跨度时段注意的总和 -- -- 引入一个经过修改的上下文矢量,该矢量也考虑到各编码单位的完全注意,给我们一个新的交叉注意机制。除此之外,我们还调查共同估计发言者身高和年龄的多任务学习方法。我们在TIMIT文集上培训和测试我们的模型。我们的模型超越了文献中的若干方法。我们分别对男女高度的根平均值(RMSE)为6.92cm6.34厘米,对男女高度的根平均值为6.85年,而RME为8.75年,给我们提供了一个新的交叉注意机制。除此之外,我们还调查了一种多任务学习方法,以联合估计发言者的重量分别用于不同的移动电话。

0

相关内容

估计/估计量

估计/估计量

注意力机制综述

注意力机制综述

专知会员服务

208+阅读 · 2021年1月26日

【ACL2020-复旦大学】FLAT：采用扁平化Transformer的中文NER，FLAT: Chinese NER Using Flat-Lattice Transformer

【ACL2020-复旦大学】FLAT：采用扁平化Transformer的中文NER，FLAT: Chinese NER Using Flat-Lattice Transformer

专知会员服务

64+阅读 · 2020年4月28日

【论文推荐】基于机器学习的5G网络异常检测，Machine Learning based Anomaly Detection for 5G Networks

【论文推荐】基于机器学习的5G网络异常检测，Machine Learning based Anomaly Detection for 5G Networks

专知会员服务

36+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

从Seq2seq到Attention模型到Self Attention（二）

从Seq2seq到Attention模型到Self Attention（二）

量化投资与机器学习

23+阅读 · 2018年10月9日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

【泡泡一分钟】对基于循环神经网络（RNN）框架叠加下异常检测的稀疏编码方法的再研究(ICCV2017-33)

【泡泡一分钟】对基于循环神经网络（RNN）框架叠加下异常检测的稀疏编码方法的再研究(ICCV2017-33)

泡泡机器人SLAM

4+阅读 · 2018年6月7日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

学界 | 新型循环神经网络IndRNN：可构建更长更深的RNN（附GitHub实现）

学界 | 新型循环神经网络IndRNN：可构建更长更深的RNN（附GitHub实现）

机器之心

5+阅读 · 2018年3月19日

tensorflow LSTM + CTC实现端到端OCR

tensorflow LSTM + CTC实现端到端OCR

机器学习研究会

26+阅读 · 2017年11月16日

Highway Networks For Sentence Classification

Highway Networks For Sentence Classification

哈工大SCIR

4+阅读 · 2017年9月30日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Dense CNN with Self-Attention for Time-Domain Speech Enhancement

Arxiv

1+阅读 · 2021年3月8日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

A sequential guiding network with attention for image captioning

A sequential guiding network with attention for image captioning

Arxiv

5+阅读 · 2019年2月8日

An Attention-Gated Convolutional Neural Network for Sentence Classification

An Attention-Gated Convolutional Neural Network for Sentence Classification

Arxiv

4+阅读 · 2018年12月28日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Where to put the Image in an Image Caption Generator

Arxiv

3+阅读 · 2018年3月14日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Arxiv

14+阅读 · 2018年3月14日

A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction

Arxiv

5+阅读 · 2018年1月26日

Dual Path Networks for Multi-Person Human Pose Estimation

Arxiv

3+阅读 · 2017年10月27日

VIP会员

文章信息

相关主题

估计/估计量

注意力机制

上下文向量

相关VIP内容

注意力机制综述

注意力机制综述

专知会员服务

208+阅读 · 2021年1月26日

【ACL2020-复旦大学】FLAT：采用扁平化Transformer的中文NER，FLAT: Chinese NER Using Flat-Lattice Transformer

【ACL2020-复旦大学】FLAT：采用扁平化Transformer的中文NER，FLAT: Chinese NER Using Flat-Lattice Transformer

专知会员服务

64+阅读 · 2020年4月28日

【论文推荐】基于机器学习的5G网络异常检测，Machine Learning based Anomaly Detection for 5G Networks

【论文推荐】基于机器学习的5G网络异常检测，Machine Learning based Anomaly Detection for 5G Networks

专知会员服务

36+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

从Seq2seq到Attention模型到Self Attention（二）

从Seq2seq到Attention模型到Self Attention（二）

量化投资与机器学习

23+阅读 · 2018年10月9日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

【泡泡一分钟】对基于循环神经网络（RNN）框架叠加下异常检测的稀疏编码方法的再研究(ICCV2017-33)

【泡泡一分钟】对基于循环神经网络（RNN）框架叠加下异常检测的稀疏编码方法的再研究(ICCV2017-33)

泡泡机器人SLAM

4+阅读 · 2018年6月7日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

学界 | 新型循环神经网络IndRNN：可构建更长更深的RNN（附GitHub实现）

学界 | 新型循环神经网络IndRNN：可构建更长更深的RNN（附GitHub实现）

机器之心

5+阅读 · 2018年3月19日

tensorflow LSTM + CTC实现端到端OCR

tensorflow LSTM + CTC实现端到端OCR

机器学习研究会

26+阅读 · 2017年11月16日

Highway Networks For Sentence Classification

Highway Networks For Sentence Classification

哈工大SCIR

4+阅读 · 2017年9月30日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Dense CNN with Self-Attention for Time-Domain Speech Enhancement

Arxiv

1+阅读 · 2021年3月8日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

A sequential guiding network with attention for image captioning

A sequential guiding network with attention for image captioning

Arxiv

5+阅读 · 2019年2月8日

An Attention-Gated Convolutional Neural Network for Sentence Classification

An Attention-Gated Convolutional Neural Network for Sentence Classification

Arxiv

4+阅读 · 2018年12月28日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Where to put the Image in an Image Caption Generator

Arxiv

3+阅读 · 2018年3月14日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Arxiv

14+阅读 · 2018年3月14日

A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction

Arxiv

5+阅读 · 2018年1月26日

Dual Path Networks for Multi-Person Human Pose Estimation

Arxiv

3+阅读 · 2017年10月27日

微信扫码咨询专知VIP会员