Most software domains rely on compilers to translate high-level code to multiple different machine languages, with performance not too much worse than what developers would have the patience to write directly in assembly language. However, cryptography has been an exception, where many performance-critical routines have been written directly in assembly (sometimes through metaprogramming layers). Some past work has shown how to do formal verification of that assembly, and other work has shown how to generate C code automatically along with formal proof, but with consequent performance penalties vs. the best-known assembly. We present CryptOpt, the first compilation pipeline that specializes high-level cryptographic functional programs into assembly code significantly faster than what GCC or Clang produce, with mechanized proof (in Coq) whose final theorem statement mentions little beyond the input functional program and the operational semantics of x86-64 assembly. On the optimization side, we apply randomized search through the space of assembly programs, with repeated automatic benchmarking on target CPUs. On the formal-verification side, we connect to the Fiat Cryptography framework (which translates functional programs into C-like IR code) and extend it with a new formally verified program-equivalence checker, incorporating a modest subset of known features of SMT solvers and symbolic-execution engines. The overall prototype is quite practical, e.g. producing new fastest-known implementations of finite-field arithmetic for both Curve25519 (part of the TLS standard) and the Bitcoin elliptic curve secp256k1 for the Intel $12^{th}$ and $13^{th}$ generations.
翻译:----
大多数软件领域都依赖于编译器将高级代码转换为多种不同的机器语言,性能不会比开发者直接用汇编语言写的代码差太多。然而,密码学是个例外,在密码学中,许多性能关键的例程都是直接用汇编语言编写的(有时是通过元编程层)。过去的一些工作显示出如何对这些汇编语言进行形式化验证,其他工作则显示出如何自动产生C代码及其形式证明,但相应的性能损失超过了已知的最优汇编语言。我们提出了CryptOpt,这是第一个将高级密码功能程序专门转换为汇编代码的编译管道,其速度比GCC或Clang产生的快得多,具有Coq的机械化证明,其最终定理声明很少涉及输入功能程序和x86-64汇编语言的操作语义。在优化方面,我们通过组合使用针对程序空间的随机搜索和针对目标CPU的重复自动基准测试来实现。在形式验证方面,我们连接到Fiat Cryptography框架(将功能程序转换为类似C的IR代码),并通过新的形式验证程序等价检查器对其进行扩展,将已知的SMT求解器和符号执行引擎的一些特性纳入其中。整体原型非常实用,例如为Intel第12和第13代生产了新的终止条件场算术的最快已知实现,包括Curve25519(TLS标准的一部分)和Bitcoin椭圆曲线secp256k1。