Modern smartphones are equipped with powerful audio hardware and processors, allowing them to acquire and perform on-device speech processing at high sampling rates. However, energy consumption remains a concern, especially for resource-intensive DNNs. Prior mobile speech processing reduced computational complexity by compacting the model or reducing input dimensions via hyperparameter tuning, which reduced accuracy or required more training iterations. This paper proposes gradient descent for optimizing energy-efficient speech recording format (length and sampling rate). The goal is to reduce the input size, which reduces data collection and inference energy. For a backward pass, a masking function with non-zero derivatives (Gaussian, Hann, and Hamming) is used as a windowing function and a lowpass filter. An energy-efficient penalty is introduced to incentivize the reduction of the input size. The proposed masking outperformed baselines by 8.7% in speaker recognition and traumatic brain injury detection using 49% shorter duration, sampled at a lower frequency.
翻译:现代智能手机配备了强大的音频硬件和处理器,允许它们以高取样率获取和进行节能语音处理,但是,能源消耗仍然是一个令人关切的问题,特别是资源密集的DNN, 先前的移动语音处理通过压缩模型或通过超光谱调换减少输入维度,降低精度或要求更多的培训迭代来降低计算复杂性。 本文提议为优化节能语音记录格式(长度和取样率)而进行梯度下降。 目标是降低输入尺寸,减少数据收集和推断能量。 对于后传,使用非零衍生物(Gausian、Hann和Hamming)的遮罩功能作为窗口功能和低传动过滤器。 引入节能惩罚,鼓励减少输入体的大小。 提议在语音识别和创伤性脑损伤检测中以8.7%的短49%的识别和低频率取样,以低频率取代8.7%的基线。