In the evolving landscape of the Internet of Things (IoT), integrating cognitive radio (CR) has become a practical solution to address the challenge of spectrum scarcity, leading to the development of cognitive IoT (CIoT). However, the vulnerability of radio communications makes radio jamming attacks a key concern in CIoT networks. In this paper, we introduce a novel deep reinforcement learning (DRL) approach designed to optimize throughput and extend network lifetime of an energy-constrained CIoT system under jamming attacks. This DRL framework equips a CIoT device with the autonomy to manage energy harvesting (EH) and data transmission, while also regulating its transmit power to respect spectrum-sharing constraints. We formulate the optimization problem under various constraints, and we model the CIoT device's interactions within the channel as a model-free Markov decision process (MDP). The MDP serves as a foundation to develop a double deep Q-network (DDQN), designed to help the CIoT agent learn the optimal communication policy to navigate challenges such as dynamic channel occupancy, jamming attacks, and channel fading while achieving its goal. Additionally, we introduce a variant of the upper confidence bound (UCB) algorithm, named UCB-IA, which enhances the CIoT network's ability to efficiently navigate jamming attacks within the channel. The proposed DRL algorithm does not rely on prior knowledge and uses locally observable information such as channel occupancy, jamming activity, channel gain, and energy arrival to make decisions. Extensive simulations prove that our proposed DRL algorithm that utilizes the UCB-IA strategy surpasses existing benchmarks, allowing for a more adaptive, energy-efficient, and secure spectrum sharing in CIoT networks.
翻译:在物联网(IoT)不断发展的背景下,集成认知无线电(CR)已成为应对频谱稀缺挑战的实用解决方案,从而推动了认知物联网(CIoT)的发展。然而,无线电通信的脆弱性使得无线电干扰攻击成为CIoT网络中的一个关键问题。本文提出了一种新颖的深度强化学习(DRL)方法,旨在优化受干扰攻击的能量受限CIoT系统的吞吐量并延长其网络寿命。该DRL框架使CIoT设备能够自主管理能量收集(EH)和数据传输,同时调节其发射功率以遵守频谱共享约束。我们在多种约束下构建了优化问题,并将CIoT设备在信道中的交互建模为无模型马尔可夫决策过程(MDP)。该MDP作为开发双深度Q网络(DDQN)的基础,旨在帮助CIoT智能体学习最优通信策略,以应对动态信道占用、干扰攻击和信道衰落等挑战,同时实现其目标。此外,我们引入了一种改进的上置信界(UCB)算法变体,称为UCB-IA,该算法增强了CIoT网络在信道中有效应对干扰攻击的能力。所提出的DRL算法不依赖于先验知识,而是利用局部可观测信息(如信道占用、干扰活动、信道增益和能量到达)进行决策。大量仿真实验证明,我们提出的采用UCB-IA策略的DRL算法超越了现有基准,能够在CIoT网络中实现更具适应性、能效更高且更安全的频谱共享。