Finding the optimal pass sequence of compilation can lead to a significant reduction in program size and/or improvement in program efficiency. Prior works on compilation pass ordering have two major drawbacks. They either require an excessive budget (in terms of compilation steps) at compile time or fail to generalize to unseen programs. In this paper, for code-size reduction tasks, we propose a novel pipeline to find program-dependent pass sequences within 45 compilation calls. It first identifies a coreset of 50 pass sequences via greedy optimization of a submodular function, and then learns a policy with Graph Neural Network (GNN) to pick the optimal sequence by predicting the normalized values of the pass sequences in the coreset. Despite its simplicity, our pipeline outperforms the default -Oz flag by an average of 4.7% over a large collection (4683) of unseen code repositories from diverse domains across 14 datasets. In comparison, previous approaches like reinforcement learning on the raw pass sequence space may take days to train due to sparse reward, and may not generalize well in held-out ones from different domains. Our results demonstrate that existing human-designed compiler flags can be improved with a simple yet effective technique that transforms the raw action space into a small one with denser rewards.
翻译:查找最佳编译的通过序列,可以导致程序规模的大幅缩小和(或)程序效率的提高。 编译传票先前的工作有两个主要缺点。 它们要么在编译时需要过度的预算( 编译步骤), 要么在编译时需要过度的预算( 编译步骤 ), 要么没有向看不见的程序推广。 在本文中, 代码缩放任务, 我们提出一个新的管道, 在45个编译电话中找到程序依赖的通过序列。 它首先通过对子模块功能的贪婪优化, 确定50个通过序列的核心数据集, 然后与图形神经网络( GNNN) 一起学习一项政策, 以便通过预测核心集中传票序列的正常值来选择最佳序列。 尽管程序简洁, 我们的编程比默认- Oz 旗高出4. 7 % 。 相比之下, 先前的办法, 如在原始传译序列空间的强化学习, 可能需要几天的时间来培训, 并且可能无法将来自不同域的封存的编程加以概括化。 我们的结果表明, 现有的小编译编译的编译的编程的编程的编译者旗可以用简单有效的技术改进。