Non-intrusive speech intelligibility (SI) prediction from binaural signals is useful in many applications. However, most existing signal-based measures are designed to be applied to single-channel signals. Measures specifically designed to take into account the binaural properties of the signal are often intrusive - characterised by requiring access to a clean speech signal - and typically rely on combining both channels into a single-channel signal before making predictions. This paper proposes a non-intrusive SI measure that computes features from a binaural input signal using a combination of vector quantization (VQ) and contrastive predictive coding (CPC) methods. VQ-CPC feature extraction does not rely on any model of the auditory system and is instead trained to maximise the mutual information between the input signal and output features. The computed VQ-CPC features are input to a predicting function parameterized by a neural network. Two predicting functions are considered in this paper. Both feature extractor and predicting functions are trained on simulated binaural signals with isotropic noise. They are tested on simulated signals with isotropic and real noise. For all signals, the ground truth scores are the (intrusive) deterministic binaural STOI. Results are presented in terms of correlations and MSE and demonstrate that VQ-CPC features are able to capture information relevant to modelling SI and outperform all the considered benchmarks - even when evaluating on data comprising of different noise field types.
翻译:从二进制信号中进行无侵扰性言语感知性(SI)预测在许多应用中是有用的。然而,大多数现有基于信号的措施设计成适用于单声道信号的信号。专门设计考虑到信号二进制特性的措施往往具有侵扰性,其特点是需要获得清洁言语信号,通常依靠将两个渠道结合成单一声道信号,然后才作出预测。本文件建议采用一种非侵扰性SI测量,用矢量度和对比性预测编码(CPC)方法组合计算二进制输入信号的特征。VQ-CPC特征提取不依赖任何听觉系统模型,而是经过培训,以最大限度地利用输入信号和输出输出特性之间的相互信息。计算出的VQPC特征是为了将两个渠道合并成单一声道信号,然后通过神经网络进行参数的预测。本文中考虑了两种预测功能。用异位感波测波测的信号都经过模拟,甚至以不同类型模拟的预测编码编码编码进行测试。在模拟性信号的模拟性信号上,采用异质和正态的实地测度数据是真实性、真实性数据。