The conventional wisdom has been that designing ultra-compact, battery-constrained wireless hearables with on-device speech AI models is challenging due to the high computational demands of streaming deep learning models. Speech AI models require continuous, real-time audio processing, imposing strict computational and I/O constraints. We present NeuralAids, a fully on-device speech AI system for wireless hearables, enabling real-time speech enhancement and denoising on compact, battery-constrained devices. Our system bridges the gap between state-of-the-art deep learning for speech enhancement and low-power AI hardware by making three key technical contributions: 1) a wireless hearable platform integrating a speech AI accelerator for efficient on-device streaming inference, 2) an optimized dual-path neural network designed for low-latency, high-quality speech enhancement, and 3) a hardware-software co-design that uses mixed-precision quantization and quantization-aware training to achieve real-time performance under strict power constraints. Our system processes 6 ms audio chunks in real-time, achieving an inference time of 5.54 ms while consuming 71.6 mW. In real-world evaluations, including a user study with 28 participants, our system outperforms prior on-device models in speech quality and noise suppression, paving the way for next-generation intelligent wireless hearables that can enhance hearing entirely on-device.
翻译:传统观点认为,由于流式深度学习模型的高计算需求,设计搭载设备端语音AI模型的超紧凑、电池受限的无线可听设备具有挑战性。语音AI模型需要连续、实时的音频处理,这带来了严格的计算和I/O约束。我们提出了NeuralAids,一个用于无线可听设备的完全设备端语音AI系统,能够在紧凑、电池受限的设备上实现实时语音增强和降噪。我们的系统通过三项关键技术贡献,弥合了最先进的语音增强深度学习与低功耗AI硬件之间的鸿沟:1) 集成语音AI加速器的无线可听平台,用于高效的设备端流式推理;2) 为低延迟、高质量语音增强而优化的双路径神经网络;3) 采用混合精度量化和量化感知训练的软硬件协同设计,以在严格的功耗约束下实现实时性能。我们的系统实时处理6毫秒的音频片段,推理时间达到5.54毫秒,同时功耗为71.6毫瓦。在包括28名参与者的用户研究在内的真实世界评估中,我们的系统在语音质量和噪声抑制方面优于先前的设备端模型,为能够完全在设备端增强听力的下一代智能无线可听设备铺平了道路。