Side-channel analysis (SCA) poses a real-world threat by exploiting unintentional physical signals to extract secret information from secure devices. Evaluation labs also use the same techniques to certify device security. In recent years, deep learning has emerged as a prominent method for SCA, achieving state-of-the-art attack performance at the cost of interpretability. Understanding how neural networks extract secrets is crucial for security evaluators aiming to defend against such attacks, as only by understanding the attack can one propose better countermeasures. In this work, we apply mechanistic interpretability to neural networks trained for SCA, revealing \textit{how} models exploit \textit{what} leakage in side-channel traces. We focus on sudden jumps in performance to reverse engineer learned representations, ultimately recovering secret masks and moving the evaluation process from black-box to white-box. Our results show that mechanistic interpretability can scale to realistic SCA settings, even when relevant inputs are sparse, model accuracies are low, and side-channel protections prevent standard input interventions.
翻译:侧信道分析(SCA)通过利用无意的物理信号从安全设备中提取秘密信息,构成现实世界中的安全威胁。评估实验室同样采用此类技术对设备安全性进行认证。近年来,深度学习已成为SCA领域的重要方法,以可解释性为代价实现了最先进的攻击性能。对于旨在防御此类攻击的安全评估者而言,理解神经网络如何提取秘密信息至关重要——唯有理解攻击机制,方能提出更有效的防护对策。本研究将机制可解释性方法应用于为SCA训练的神经网络,揭示模型如何利用侧信道轨迹中的何种泄漏信息。我们聚焦于性能突变现象以逆向工程学习到的表征,最终成功恢复秘密掩码,并将评估流程从黑盒推进至白盒阶段。研究结果表明,即使在相关输入稀疏、模型准确率较低且侧信道防护措施阻碍标准输入干预的现实SCA场景中,机制可解释性方法仍具备可扩展性。