We present a new type of attack in which source code is maliciously encoded so that it appears different to a compiler and to the human eye. This attack exploits subtleties in text-encoding standards such as Unicode to produce source code whose tokens are logically encoded in a different order from the one in which they are displayed, leading to vulnerabilities that cannot be perceived directly by human code reviewers. 'Trojan Source' attacks, as we call them, pose an immediate threat both to first-party software and of supply-chain compromise across the industry. We present working examples of Trojan Source attacks in C, C++, C#, JavaScript, Java, Rust, Go, Python, SQL, Bash, Assembly, and Solidity. We propose definitive compiler-level defenses, and describe other mitigating controls that can be deployed in editors, repositories, and build pipelines while compilers are upgraded to block this attack. We document an industry-wide coordinated disclosure for these vulnerabilities; as they affect most compilers, editors, and repositories, the exercise teaches how different firms, open-source communities, and other stakeholders respond to vulnerability disclosure.
翻译:我们展示了一种新型攻击,其中源代码被恶意编码,从而看起来与编译者和人类眼睛不同。这次攻击利用了Unicode等文本编码标准中的微妙之处,产生源代码,其标语在逻辑上以不同于显示的顺序编码,导致人类编码审评员无法直接看到的脆弱性。我们称之为“热带源代码”的攻击,对第一党软件和整个行业供应链妥协构成了直接威胁。我们在C、C++、C#、JavaScript、Java、Rust、Go、Python、SQL、Bash、大会和Colicity等文本编码标准中展示了特洛伊源源代码袭击的实例。我们提出了明确的编译者层面的防御,并描述了可以在编辑、储存库中部署的其他减轻风险的控制措施,同时编译者在阻止这种攻击的同时建造管道。我们记录了整个行业协调披露这些脆弱性的情况,因为它们影响到大多数编译者、编辑和储存者、演练过程如何指导不同的公司、开放源社区和其他利益攸关者如何对披露脆弱性作出反应。</s>