The events of recent years have highlighted the importance of telemedicine solutions which could potentially allow remote treatment and diagnosis. Relatedly, Computational Paralinguistics, a unique subfield of Speech Processing, aims to extract information about the speaker and form an important part of telemedicine applications. In this work, we focus on two paralinguistic problems: mask detection and breathing state prediction. Solutions developed for these tasks could be invaluable and have the potential to help monitor and limit the spread of a virus like COVID-19. The current state-of-the-art methods proposed for these tasks are ensembles based on deep neural networks like ResNets in conjunction with feature engineering. Although these ensembles can achieve high accuracy, they also have a large footprint and require substantial computational power reducing portability to devices with limited resources. These drawbacks also mean that the previously proposed solutions are infeasible to be used in a telemedicine system due to their size and speed. On the other hand, employing lighter feature-engineered systems can be laborious and add further complexity making them difficult to create a deployable system quickly. This work proposes an ensemble-based automatic feature selection method to enable the development of fast and memory-efficient systems. In particular, we propose an output-gradient-based method to discover essential features using large, well-performing ensembles before training a smaller one. In our experiments, we observed considerable (25-32%) reductions in inference times using neural network ensembles based on output-gradient-based features. Our method offers a simple way to increase the speed of the system and enable real-time usage while maintaining competitive results with larger-footprint ensemble using all spectral features.
翻译:近年来的事件凸显了远程医疗解决方案的重要性,这些解决方案有可能允许远程治疗和诊断。与此相关的是,语音处理的一个独特的子领域,即计算性语言学,旨在提取关于演讲人的信息,并构成远程医疗应用的一个重要部分。在这项工作中,我们侧重于两个语言学问题:面具探测和呼吸状态预测。为这些任务开发的解决方案可能非常宝贵,并有可能帮助监测和限制像COVID-19这样的病毒的传播。目前为这些任务提出的最先进的方法是基于更深的神经神经网络(如ResNets等与地貌工程相结合的更先进的神经网络)的集合。尽管这些组合可以取得关于演讲人的信息,并构成远程医疗应用的一个重要部分。在这项工作中,使用一个更小的内脏网络功能,使用一个更精细的内脏系统,同时使用一个更精细的内脏的内脏系统。我们用一个更精细的内脏的内脏的内脏方法,用一个更精细的内脏的内脏的内脏的内脏系统。