Millimeter-wave (mmWave) and terahertz (THz) communications require beamforming to acquire adequate receive signal-to-noise ratio (SNR). To find the optimal beam, current beam management solutions perform beam training over a large number of beams in pre-defined codebooks. The beam training overhead increases the access latency and can become infeasible for high-mobility applications. To reduce or even eliminate this beam training overhead, we propose to utilize the visual data, captured for example by cameras at the base stations, to guide the beam tracking/refining process. We propose a machine learning (ML) framework, based on an encoder-decoder architecture, that can predict the future beams using the previously obtained visual sensing information. Our proposed approach is evaluated on a large-scale real-world dataset, where it achieves an accuracy of $64.47\%$ (and a normalized receive power of $97.66\%$) in predicting the future beam. This is achieved while requiring less than $1\%$ of the beam training overhead of a corresponding baseline solution that uses a sequence of previous beams to predict the future one. This high performance and low overhead obtained on the real-world dataset demonstrate the potential of the proposed vision-aided beam tracking approach in real-world applications.
翻译:毫米波(mm Wave) 和 terahertz (Thz) 通信需要波束成型以获得足够的接收信号到噪音比率(SNR) 。 为了找到最优的波束,目前的波束管理解决方案在预先定义的代码簿中对大量波束进行光束培训。 光束管理培训提高了访问延缓度, 并有可能成为高流动性应用的不可行。 为了减少甚至消除这种波束培训间接费用, 我们提议在预测未来时使用视觉数据, 例如由基地站的照相机摄取的视觉数据, 以指导光束跟踪/ 修补进程。 我们提议了一个机器学习(ML)框架, 以编码- 分解器结构为基础, 利用先前获得的视觉感测信息预测未来。 我们拟议的方法是在大型真实世界数据集上进行评估, 其精确度为64.47 美元( 和 正常获得97.66 美元 美元 ) 。 我们提议在预测未来时, 实现这一点, 而在拟议的高基线解决方案中, 实际培训的顶部不到1 $1 $1美元, 并且 将显示前一个全球 未来 数据 预测 的预测 的 将显示 未来 。