Voice disorders significantly undermine people's ability to speak in their daily lives. Without early diagnoses and treatments, these disorders may drastically deteriorate. Thus, automatic detection systems at home are desired for people inaccessible to disease assessments. However, more accurate systems usually require more cumbersome machine learning models, whereas the memory and computational resources of the systems at home are limited. Moreover, the performance of the systems may be weakened due to domain mismatch between clinic and real-world data. Therefore, we aimed to develop a compressed and domain-robust pathological voice detection system. Domain adversarial training was utilized to address domain mismatch by extracting domain-invariant features. In addition, factorized convolutional neural networks were exploited to compress the feature extractor model. The results showed that only 4% of degradation of unweighted average recall occurred in the target domain compared to the source domain, indicating that the domain mismatch was effectively eliminated. Furthermore, our system reduced both usages of memory and computation by over 73.9%. We concluded that this proposed system successfully resolved domain mismatch and may be applicable to embedded systems at home with limited resources.
翻译:语音障碍会大大削弱人们在日常生活中说话的能力。没有早期诊断和治疗,这些障碍可能会急剧恶化。因此,人们无法进行疾病评估,需要在家里建立自动检测系统。然而,更精确的系统通常需要更繁琐的机器学习模式,而家庭系统的记忆和计算资源则有限。此外,由于诊所与现实世界数据之间的域间脱节,这些系统的性能可能会削弱。因此,我们的目标是开发一个压缩和域-紫外线病理声音探测系统。通过提取域-异性特征,对域进行对抗性培训,以解决域不匹配的问题。此外,还利用因数共振神经网络来压缩地物提取模型。结果显示,与源域相比,目标域中未加权平均回想的退化只有4%发生,表明域间不匹配实际上已经消除了。此外,我们的系统将记忆和计算的使用减少了73.9%以上。我们的结论是,拟议的系统成功地解决域间不匹配问题,并可能适用于资源有限的家庭嵌入系统。