Auditory attention decoding (AAD) is a technique used to identify and amplify the talker that a listener is focused on in a noisy environment. This is done by comparing the listener's brainwaves to a representation of all the sound sources to find the closest match. The representation is typically the waveform or spectrogram of the sounds. The effectiveness of these representations for AAD is uncertain. In this study, we examined the use of self-supervised learned speech representation in improving the accuracy and speed of AAD. We recorded the brain activity of three subjects using invasive electrocorticography (ECoG) as they listened to two conversations and focused on one. We used WavLM to extract a latent representation of each talker and trained a spatiotemporal filter to map brain activity to intermediate representations of speech. During the evaluation, the reconstructed representation is compared to each speaker's representation to determine the target speaker. Our results indicate that speech representation from WavLM provides better decoding accuracy and speed than the speech envelope and spectrogram. Our findings demonstrate the advantages of self-supervised learned speech representation for auditory attention decoding and pave the way for developing brain-controlled hearable technologies.
翻译:听力解码( AAD) 是一种技术,用于识别和扩大听众在噪音环境中集中的谈话器,通过将听众的脑电波与所有声音源的表达方式进行比较,找到最接近的匹配点。典型的表示方式是声音的波形或光谱图。这些表达方式的有效性不确定。在本研究中,我们研究了使用自我监督的学习语言表达方式提高AAD的准确度和速度。我们记录了三个对象的脑活动,他们用侵入性电算法(ECoG)倾听了两次谈话并聚焦了一次。我们用WavLM 来提取每个谈话器的潜在代表方式,并训练了一个波形过滤器来绘制脑活动图与中间的表达方式。在评估中,重新表达方式与每个发言者的表述方式相比较,以确定目标演讲者。我们的结果显示,WavLM 的演讲表达方式比语音信封和光谱图(ECOG) 提供了更好的解码准确度和速度。我们的调查结果表明,自我监督的语音表达方式对于开发可控路段注意力的脑听力解剖析技术的优点。