In Fog-assisted IoT systems, it is a common practice to cache popular content at the network edge to achieve high quality of service. Due to uncertainties in practice such as unknown file popularities, cache placement scheme design is still an open problem with unresolved challenges: 1) how to maintain time-averaged storage costs under budgets, 2) how to incorporate online learning to aid cache placement to minimize performance loss (a.k.a. regret), and 3) how to exploit offline historical information to further reduce regret. In this paper, we formulate the cache placement problem with unknown file popularities as a constrained combinatorial multi-armed bandit (CMAB) problem. To solve the problem, we employ virtual queue techniques to manage time-averaged storage cost constraints, and adopt history-aware bandit learning methods to integrate offline historical information into the online learning procedure to handle the exploration-exploitation tradeoff. With an effective combination of online control and history-aware online learning, we devise a Cache Placement scheme with History-aware Bandit Learning called CPHBL. Our theoretical analysis and simulations show that CPHBL achieves a sublinear time-averaged regret bound. Moreover, the simulation results verify CPHBL's advantage over the deep reinforcement learning based approach.
翻译:在Fog 辅助 IoT 系统中,通常的做法是在网络边缘隐藏大众内容,以达到高质量的服务。由于在实践上的不确定性,例如未知的档案普及程度,缓存安放计划的设计仍然是一个尚未解决的难题:(1) 如何在预算下保持平均时间存储费用,(2) 如何纳入在线学习以帮助缓存安放以尽量减少业绩损失(a.k.a.遗憾),以及(3) 如何利用离线历史信息来进一步减少遗憾。在本文件中,我们将未知文件普及程度的缓存安放问题编成一个受限制的组合式多臂强盗(CMAB)问题。为了解决问题,我们使用虚拟排队技术来管理平均存储成本限制的时间,并采用具有历史意识的带宽学习方法,将离线历史信息纳入在线学习程序,以便处理探索-开发交易(a.k.a.a.form.),同时将在线控制与历史觉悟在线学习有效地结合起来,我们设计了一个缓存安放计划,称为CPHBL。我们的理论分析和模拟显示,CPHBL 实现了基于深度时间优势模拟。