Compiler architects increasingly look to machine learning when building heuristics for compiler optimization. The promise of automatic heuristic design, freeing the compiler engineer from the complex interactions of program, architecture, and other optimizations, is alluring. However, most machine learning methods cannot replicate even the simplest of the abstract interpretations of data flow analysis that are critical to making good optimization decisions. This must change for machine learning to become the dominant technology in compiler heuristics. To this end, we propose ProGraML - Program Graphs for Machine Learning - a language-independent, portable representation of whole-program semantics for deep learning. To benchmark current and future learning techniques for compiler analyses we introduce an open dataset of 461k Intermediate Representation (IR) files for LLVM, covering five source programming languages, and 15.4M corresponding data flow results. We formulate data flow analysis as an MPNN and show that, using ProGraML, standard analyses can be learned, yielding improved performance on downstream compiler optimization tasks.
翻译:编纂者建筑师在为编译者优化而建立超自然学时越来越看机器学习。 自动超自然设计的承诺, 使编译者工程师摆脱程序、 架构和其他优化的复杂互动关系, 令人振奋。 然而, 大多数机器学习方法甚至不能复制数据流分析的最简单的抽象解释, 而这种解释对于做出良好的优化决策至关重要。 这必须改变, 使机器学习成为编译者超自然学中的主要技术。 为此, 我们提议 ProGramML - 机器学习程序图 - 一种语言独立、 便携式的全方案语义表解, 用于深层学习。 为了对编译者分析的当前和未来学习技术进行基准, 我们为LLLVM 引入了一个包含五种源编程语言的461k中间代表(IR)文档的开放数据集, 以及15.4M 相应的数据流结果。 我们把数据流分析作为MPNN, 并显示, 使用 ProGramLL, 可以学习标准分析, 提高下游编译员优化任务的业绩。