通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning
在未知环境中权衡探索和开发是最大程度地提高学习过程中预期回报的关键。 一种贝叶斯最优策略，它以最佳方式运行，不仅取决于环境状态，还取决于主体对环境的不确定性，决定其行动。 但是，除了最小的任务外，计算贝叶斯最佳策略是很困难的。 在本文中，我们介绍了变分贝叶斯自适应深度RL（variBAD），这是一种在未知环境中进行元学习以进行近似推理的方法，并直接在动作选择过程中合并任务不确定性。 在网格世界中，我们说明variBAD如何根据任务不确定性执行结构化的在线探索。 我们还评估了在meta-RL中广泛使用的MuJoCo域上的variBAD，并表明与现有方法相比，它在训练过程中获得了更高的回报。
Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson
Massive machine-type communications (mMTC) are poised to provide ubiquitous connectivity for billions of Internet-of-Things (IoT) devices. However, the required low-latency massive access necessitates a paradigm shift in the design of random access schemes, which invokes a need of efficient joint activity and data detection (JADD) algorithms. By exploiting the feature of sporadic traffic in massive access, a beacon-aided slotted grant-free massive access solution is proposed. Specifically, we spread the uplink access signals in multiple subcarriers with pre-equalization processing and formulate the JADD as a multiple measurement vector (MMV) compressive sensing problem. Moreover, to leverage the structured sparsity of uplink massive access signals among multiple time slots, we develop two computationally efficient detection algorithms, which are termed as orthogonal approximate message passing (OAMP)-MMV algorithm with simplified structure learning (SSL) and accurate structure learning (ASL). To achieve accurate detection, the expectation maximization algorithm is exploited for learning the sparsity ratio and the noise variance. To further improve the detection performance, channel coding is applied and successive interference cancellation (SIC)-based OAMP-MMV-SSL and OAMP-MMV-ASL algorithms are developed, where the likelihood ratio obtained in the soft-decision can be exploited for refining the activity identification. Finally, the state evolution of the proposed OAMP-MMV-SSL and OAMP-MMV-ASL algorithms is derived to predict the performance theoretically. Simulation results verify that the proposed solutions outperform various state-of-the-art baseline schemes, enabling low-latency random access and high-reliable massive IoT connectivity with overloading.