A communication enabled indoor intelligent robots (IRs) service framework is proposed, where non-orthogonal multiple access (NOMA) technique is adopted to enable highly reliable communications. In cooperation with the ultramodern indoor channel model recently proposed by the International Telecommunication Union (ITU), the Lego modeling method is proposed, which can deterministically describe the indoor layout and channel state in order to construct the radio map. The investigated radio map is invoked as a virtual environment to train the reinforcement learning agent, which can save training time and hardware costs. Build on the proposed communication model, motions of IRs who need to reach designated mission destinations and their corresponding down-link power allocation policy are jointly optimized to maximize the mission efficiency and communication reliability of IRs. In an effort to solve this optimization problem, a novel reinforcement learning approach named deep transfer deterministic policy gradient (DT-DPG) algorithm is proposed. Our simulation results demonstrate that 1) With the aid of NOMA techniques, the communication reliability of IRs is effectively improved; 2) The radio map is qualified to be a virtual training environment, and its statistical channel state information improves training efficiency by about 30%; 3) The proposed DT-DPG algorithm is superior to the conventional deep deterministic policy gradient (DDPG) algorithm in terms of optimization performance, training time, and anti-local optimum ability.
翻译:提议了一个通信促进室内智能机器人(IRs)服务框架,其中采用非横向多接入技术,以便能够进行高度可靠的通信; 与国际电信联盟(国际电联)最近提议的超现代室内频道模型合作,提出了勒高建模方法,该模型可以明确描述室内布局和频道状态,以便建造无线电地图; 将调查的无线电地图用作培训强化学习剂的虚拟环境,这可以节省培训时间和硬件费用; 在拟议的通信模型的基础上,优化需要到达指定特派团目的地的IRs动议及其相应的下链电力分配政策,以最大限度地提高IRs的任务效率和通信可靠性; 为努力解决这一优化问题,提议了一个名为深度转移确定性政策梯度(DT-DPG)算法的新型强化学习方法; 我们的模拟结果表明:(1) 在NOMA技术的帮助下,IRs通信可靠性得到有效改善;(2) 无线电地图有资格成为虚拟培训环境,其统计渠道的下链动力分配政策配置能力得到优化,通过30 %的SDDM-D-DA级算法提高当地培训效率。