Voice-based interaction has emerged as a natural and intuitive modality for controlling IoT devices. However, speech-driven edge devices face a fundamental trade-off between cloud-based solutions, which offer stronger language understanding capabilities at the cost of latency, connectivity dependence, and privacy concerns, and edge-based solutions, which provide low latency and improved privacy but are limited by computational constraints. This paper presents ASTA, an adaptive speech-to-action solution that dynamically routes voice commands between edge and cloud inference to balance performance and system resource utilization. ASTA integrates on-device automatic speech recognition and lightweight offline language-model inference with cloud-based LLM processing, guided by real-time system metrics such as CPU workload, device temperature, and network latency. A metric-aware routing mechanism selects the inference path at runtime, while a rule-based command validation and repair component ensures successful end-to-end command execution. We implemented our solution on an NVIDIA Jetson-based edge platform and evaluated it using a diverse dataset of 80 spoken commands. Experimental results show that ASTA successfully routes all input commands for execution, achieving a balanced distribution between online and offline inference. The system attains an ASR accuracy of 62.5% and generates executable commands without repair for only 47.5% of inputs, highlighting the importance of the repair mechanism in improving robustness. These results suggest that adaptive edge-cloud orchestration is a viable approach for resilient and resource-aware voice-controlled IoT systems.
翻译:语音交互已成为控制物联网设备的一种自然直观的交互方式。然而,语音驱动的边缘设备面临一个根本性权衡:基于云的解决方案提供更强的语言理解能力,但存在延迟、连接依赖和隐私问题;而基于边缘的解决方案虽能提供低延迟和更好的隐私保护,却受限于计算资源。本文提出ASTA,一种自适应语音转行动解决方案,通过动态路由语音命令在边缘与云之间进行推理,以平衡性能与系统资源利用率。ASTA集成了设备端自动语音识别与轻量级离线语言模型推理,并结合基于云的大型语言模型处理,其决策由实时系统指标(如CPU负载、设备温度和网络延迟)指导。指标感知的路由机制在运行时选择推理路径,同时基于规则的命令验证与修复组件确保端到端命令的成功执行。我们在基于NVIDIA Jetson的边缘平台上实现了该方案,并使用包含80条多样化语音命令的数据集进行评估。实验结果表明,ASTA成功路由所有输入命令并执行,实现了在线与离线推理的均衡分布。系统达到62.5%的ASR准确率,且仅对47.5%的输入无需修复即可生成可执行命令,凸显了修复机制在提升系统鲁棒性中的重要性。这些结果表明,自适应边缘-云协同是构建弹性且资源感知的语音控制物联网系统的可行途径。