It has been shown (Amuru et al. 2015) that online learning algorithms can be effectively used to select optimal physical layer parameters for jamming against digital modulation schemes without a priori knowledge of the victim's transmission strategy. However, this learning problem involves solving a multi-armed bandit problem with a mixed action space that can grow very large. As a result, convergence to the optimal jamming strategy can be slow, especially when the victim and jammer's symbols are not perfectly synchronized. In this work, we remedy the sample efficiency issues by introducing a linear bandit algorithm that accounts for inherent similarities between actions. Further, we propose context features which are well-suited for the statistical features of the non-coherent jamming problem and demonstrate significantly improved convergence behavior compared to the prior art. Additionally, we show how prior knowledge about the victim's transmissions can be seamlessly integrated into the learning framework. We finally discuss limitations in the asymptotic regime.
翻译:已经显示(Amuru等人,2015年),在线学习算法可以有效地用于选择最佳物理层参数,用于干扰数字调控计划,而不必事先了解受害人的传输战略。然而,这一学习问题涉及通过混合行动空间解决多武装强盗问题,这种混合行动空间可以发展得非常大。因此,与最佳干扰战略的趋同速度可能很慢,特别是当受害人和干扰器符号不完全同步时。在这项工作中,我们通过引入线性土匪算法,说明行动之间的内在相似之处,来纠正抽样效率问题。此外,我们提出了适合非相容干扰问题的统计特征的背景特征,并表明与先前的艺术相比,趋同行为大为改善。此外,我们展示了关于受害人的传输的先前知识如何顺利地融入学习框架。我们最后讨论了无关联性制度中的局限性。