Cybercriminals are moving towards zero-day attacks affecting resource-constrained devices such as single-board computers (SBC). Assuming that perfect security is unrealistic, Moving Target Defense (MTD) is a promising approach to mitigate attacks by dynamically altering target attack surfaces. Still, selecting suitable MTD techniques for zero-day attacks is an open challenge. Reinforcement Learning (RL) could be an effective approach to optimize the MTD selection through trial and error, but the literature fails when i) evaluating the performance of RL and MTD solutions in real-world scenarios, ii) studying whether behavioral fingerprinting is suitable for representing SBC's states, and iii) calculating the consumption of resources in SBC. To improve these limitations, the work at hand proposes an online RL-based framework to learn the correct MTD mechanisms mitigating heterogeneous zero-day attacks in SBC. The framework considers behavioral fingerprinting to represent SBCs' states and RL to learn MTD techniques that mitigate each malicious state. It has been deployed on a real IoT crowdsensing scenario with a Raspberry Pi acting as a spectrum sensor. More in detail, the Raspberry Pi has been infected with different samples of command and control malware, rootkits, and ransomware to later select between four existing MTD techniques. A set of experiments demonstrated the suitability of the framework to learn proper MTD techniques mitigating all attacks (except a harmfulness rootkit) while consuming <1 MB of storage and utilizing <55% CPU and <80% RAM.
翻译:网络犯罪分子正在走向零日攻击,影响资源限制的装置,如单板计算机(SBC)等。假设完美的安全是不现实的,移动目标防御(MTD)是通过动态改变目标攻击表面来减轻攻击的有希望的方法。不过,选择适当的MTD技术进行零天攻击仍然是一项公开的挑战。强化学习(RL)可以是一种有效的方法,通过试验和错误优化对MTD的选择,但文献在评估现实世界情景中RL和MTD80解决方案的性能时失败。二)研究行为指纹是否适合代表SBC的州,以及三)计算SBC的资源消耗量。为了改进这些限制,手头的工作提议一个基于在线RL的网络框架,以学习正确的MTD机制,减轻SBC零天攻击的混杂性。框架考虑行为指纹代表SBC的州和RL,学习减轻每个恶意状态的MTD技术。它被安装在真实的IOT图像中,使用RBERPi的 RBER PiPi为光谱存储器的正常的储存传感器。 详细地, 学习了现有的 Raspribreal 和Mestrial roal rokeal 和Mest serveal rodustral 。