机器学习黑客与硬件特洛伊木马：从内部构建的邪恶 (Evil from Within: Machine Learning Backdoors through Hardware Trojans)

Backdoors pose a serious threat to machine learning, as they can compromise the integrity of security-critical systems, such as self-driving cars. While different defenses have been proposed to address this threat, they all rely on the assumption that the hardware on which the learning models are executed during inference is trusted. In this paper, we challenge this assumption and introduce a backdoor attack that completely resides within a common hardware accelerator for machine learning. Outside of the accelerator, neither the learning model nor the software is manipulated, so that current defenses fail. To make this attack practical, we overcome two challenges: First, as memory on a hardware accelerator is severely limited, we introduce the concept of a minimal backdoor that deviates as little as possible from the original model and is activated by replacing a few model parameters only. Second, we develop a configurable hardware trojan that can be provisioned with the backdoor and performs a replacement only when the specific target model is processed. We demonstrate the practical feasibility of our attack by implanting our hardware trojan into the Xilinx Vitis AI DPU, a commercial machine-learning accelerator. We configure the trojan with a minimal backdoor for a traffic-sign recognition system. The backdoor replaces only 30 (0.069%) model parameters, yet it reliably manipulates the recognition once the input contains a backdoor trigger. Our attack expands the hardware circuit of the accelerator by 0.24% and induces no run-time overhead, rendering a detection hardly possible. Given the complex and highly distributed manufacturing process of current hardware, our work points to a new threat in machine learning that is inaccessible to current security mechanisms and calls for hardware to be manufactured only in fully trusted environments.

翻译：后门威胁严重危害机器学习安全，例如自动驾驶汽车的完整性。尽管不同的防御措施已经被提出以应对这种威胁，但它们都依赖于一个假设，即在推理期间执行学习模型的硬件是可信的。在本文中，我们挑战这一假设，并引入了一种后门攻击，它完全位于机器学习的常见硬件加速器内部。在加速器外部，既没有操纵学习模型也没有操纵软件，因此当前的防御措施会失败。为了使这种攻击实用，我们克服了两个挑战：首先，硬件加速器的内存受到严重限制，因此我们引入了最小后门的概念，该后门与原始模型的差距尽可能小，并且只替换了几个模型参数即可被激活。其次，我们开发了一个可配置的硬件特洛伊木马，可以被提供后门，当处理特定目标模型时进行替换。我们通过将硬件特洛伊木马注入 Xilinx Vitis AI DPU（一种商业机器学习加速器）来证明我们攻击的实际可行性。我们为交通标志识别系统配置了一个最小后门。后门只替换了 30 个（0.069%）模型参数，但一旦输入包含后门触发器，它就可靠地操纵识别。我们的攻击将加速器的硬件电路扩展了 0.24%，并且没有运行时开销，使检测几乎不可能。鉴于当前硬件的复杂和高度分布的制造过程，我们的工作指出了一种机器学习的新威胁，这种威胁对于当前的安全机制是无法访问的，并呼吁硬件只能在完全可信的环境中制造。