In this paper, we design a first of its kind transceiver (PHY layer) prototype for cloud-based audio-visual (AV) speech enhancement (SE) complying with high data rate and low latency requirements of future multimodal hearing assistive technology. The innovative design needs to meet multiple challenging constraints including up/down link communications, delay of transmission and signal processing, and real-time AV SE models processing. The transceiver includes device detection, frame detection, frequency offset estimation, and channel estimation capabilities. We develop both uplink (hearing aid to the cloud) and downlink (cloud to hearing aid) frame structures based on the data rate and latency requirements. Due to the varying nature of uplink information (audio and lip-reading), the uplink channel supports multiple data rate frame structure, while the downlink channel has a fixed data rate frame structure. In addition, we evaluate the latency of different PHY layer blocks of the transceiver for developed frame structures using LabVIEW NXG. This can be used with software defined radio (such as Universal Software Radio Peripheral) for real-time demonstration scenarios.
翻译:在本文中,我们设计了首个基于云的收发器(PHY层)原型,用于基于云的视听语音增强(SE),符合高数据率和未来多式联运助听技术低潜值要求。创新设计需要应对多重挑战性制约因素,包括上下连接通信、传输和信号处理的延迟以及实时AV SE模型处理。收发器包括设备检测、框架检测、频率抵消估计和频道估计能力。我们根据数据率和延时要求开发了上链接(对云的助听器)和下链接(助听器)框架结构。由于上链接信息(读和唇读)的性质不同,上链接频道支持多数据率框架结构,而下链接频道则有一个固定的数据率框架结构。此外,我们用LabVIEW NXG评估了移动器中不同PHYC层块对开发框架结构的定位。这可用于实时演示情景的软件(例如通用软件 Periveral)。