由Cortico-Thalamo-Cortical 电路启发的视听隔音模型 (An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits) - 专知论文

会员服务 ·

0

分离的 · INFORMS · MoDELS · Processing（编程语言） · Neural Networks ·

2022 年 12 月 21 日

An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits

翻译：由Cortico-Thalamo-Cortical 电路启发的视听隔音模型

Kai Li,Fenghua Xie,Hang Chen,Kexin Yuan,Xiaolin Hu

from arxiv, 13 pages, 5 figures

Audio-visual approaches involving visual inputs have laid the foundation for recent progress in speech separation. However, the optimization of the concurrent usage of auditory and visual inputs is still an active research area. Inspired by the cortico-thalamo-cortical circuit, in which the sensory processing mechanisms of different modalities modulate one another via the non-lemniscal sensory thalamus, we propose a novel cortico-thalamo-cortical neural network (CTCNet) for audio-visual speech separation (AVSS). First, the CTCNet learns hierarchical auditory and visual representations in a bottom-up manner in separate auditory and visual subnetworks, mimicking the functions of the auditory and visual cortical areas. Then, inspired by the large number of connections between cortical regions and the thalamus, the model fuses the auditory and visual information in a thalamic subnetwork through top-down connections. Finally, the model transmits this fused information back to the auditory and visual subnetworks, and the above process is repeated several times. The results of experiments on three speech separation benchmark datasets show that CTCNet remarkably outperforms existing AVSS methods with considerablely fewer parameters. These results suggest that mimicking the anatomical connectome of the mammalian brain has great potential for advancing the development of deep neural networks. Project repo is https://github.com/JusperLee/CTCNet.

翻译：包含视觉投入的视听方法为最近语音分离的进展奠定了基础。然而,同时使用听觉和视觉投入的优化使用仍然是一个活跃的研究领域。受Cotico-thalamo-cortical 电路的启发,不同模式的感官处理机制通过非lemniscal感官系统Thalamus相互调节,我们提议为视听语音分离建立一个新型的cortico-thalamo-cortal 神经网络(CTCNet)。首先,CTNet以自下而上的方式,在不同的听觉和视觉亚网络中学习分级的听觉和视觉表现,模仿听觉和视觉领域的功能。随后,由于不同模式的感官处理机制通过非lemniscal感官系统(Thalamus)的感官和视觉感官感官系统(CTCNet)之间的大量连接,我们提议为听觉和视觉的神经下方语言分离网络(CTC-J)子网络(CTS)网络(CD-I)的感官和视觉子网络(视觉子网络)的视觉分级和视觉子网络(以上进程)反复出现若干次过程。随后的实验结果将CTSA-SA-rmalalimalmamal 的实验结果连接起来,这些实验结果显示了CTSA-smamalmamalmamalmam 3的大规模的大规模的模型的实验结果。这些实验结果。这些巨大的实验结果显示的模型将大量的实验性实验性数据比了CTSA-

0

相关内容

分离的

《自然语言处理中的对抗性攻击和防御技术》2022最新157页slides

《自然语言处理中的对抗性攻击和防御技术》2022最新157页slides

专知会员服务

36+阅读 · 2022年11月6日

《AI中毒攻击》34页slides

《AI中毒攻击》34页slides

专知会员服务

26+阅读 · 2022年10月17日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

Into the Metaverse，93页ppt介绍元宇宙概念、应用、趋势

Into the Metaverse，93页ppt介绍元宇宙概念、应用、趋势

专知会员服务

49+阅读 · 2022年2月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

嗅觉受体在苜蓿盲蝽趋花行为中的功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

mTOR功能性单倍体通过ERS-IRE1/α-JNK通路调控乳腺癌细胞药物敏感性的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

CUL4-DDB1泛素连接酶复合体对卵巢癌细胞增殖和耐药性形成的表观遗传学调节

国家自然科学基金

0+阅读 · 2013年12月31日

(Cu,Ag)2Se材料的电子-声子输运特性与热电性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

TiNi形状记忆合金表面W离子注入改性及其机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

B类I型清道夫受体的缺失致小鼠自身免疫疾病相关性淋巴瘤生成的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Linked Open Data的Web服务语义互操作关键技术

国家自然科学基金

0+阅读 · 2012年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

Depth Estimation and Image Restoration by Deep Learning from Defocused Images

Arxiv

0+阅读 · 2023年2月21日

Perception of Human Motion with Different Geometric Models

Arxiv

0+阅读 · 2023年2月21日

Impact of visual assistance for automated audio captioning

Arxiv

0+阅读 · 2023年2月20日

A Sidecar Separator Can Convert a Single-Speaker Speech Recognition System to a Multi-Speaker One

Arxiv

0+阅读 · 2023年2月20日

Unreliable Partial Label Learning with Recursive Separation

Arxiv

0+阅读 · 2023年2月20日

Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks

Arxiv

0+阅读 · 2023年2月20日

Affect-Conditioned Image Generation

Arxiv

0+阅读 · 2023年2月20日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Convolutional Neural Networks for Aerial Multi-Label Pedestrian Detection

Convolutional Neural Networks for Aerial Multi-Label Pedestrian Detection

Arxiv

11+阅读 · 2018年7月16日

VIP会员

文章信息

相关主题

Processing（编程语言）

Neural Networks

相关VIP内容

《自然语言处理中的对抗性攻击和防御技术》2022最新157页slides

《自然语言处理中的对抗性攻击和防御技术》2022最新157页slides

专知会员服务

36+阅读 · 2022年11月6日

《AI中毒攻击》34页slides

《AI中毒攻击》34页slides

专知会员服务

26+阅读 · 2022年10月17日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

Into the Metaverse，93页ppt介绍元宇宙概念、应用、趋势

Into the Metaverse，93页ppt介绍元宇宙概念、应用、趋势

专知会员服务

49+阅读 · 2022年2月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

检索增强生成（RAG）技术，261页slides

美联参会指南-联合规划与执行概述及政策框架 | 32页

从DeepSeek-R1学到的三个核心经验

大规模视觉模型中的提示式适配：综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

相关论文

Depth Estimation and Image Restoration by Deep Learning from Defocused Images

Arxiv

0+阅读 · 2023年2月21日

Perception of Human Motion with Different Geometric Models

Arxiv

0+阅读 · 2023年2月21日

Impact of visual assistance for automated audio captioning

Arxiv

0+阅读 · 2023年2月20日

A Sidecar Separator Can Convert a Single-Speaker Speech Recognition System to a Multi-Speaker One

Arxiv

0+阅读 · 2023年2月20日

Unreliable Partial Label Learning with Recursive Separation

Arxiv

0+阅读 · 2023年2月20日

Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks

Arxiv

0+阅读 · 2023年2月20日

Affect-Conditioned Image Generation

Arxiv

0+阅读 · 2023年2月20日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Convolutional Neural Networks for Aerial Multi-Label Pedestrian Detection

Convolutional Neural Networks for Aerial Multi-Label Pedestrian Detection

Arxiv

11+阅读 · 2018年7月16日

相关基金

嗅觉受体在苜蓿盲蝽趋花行为中的功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

mTOR功能性单倍体通过ERS-IRE1/α-JNK通路调控乳腺癌细胞药物敏感性的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

CUL4-DDB1泛素连接酶复合体对卵巢癌细胞增殖和耐药性形成的表观遗传学调节

国家自然科学基金

0+阅读 · 2013年12月31日

(Cu,Ag)2Se材料的电子-声子输运特性与热电性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

TiNi形状记忆合金表面W离子注入改性及其机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

B类I型清道夫受体的缺失致小鼠自身免疫疾病相关性淋巴瘤生成的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Linked Open Data的Web服务语义互操作关键技术

国家自然科学基金

0+阅读 · 2012年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员