Compiler optimization decisions are often based on hand-crafted heuristics centered around a few established benchmark suites. Alternatively, they can be learned from feature and performance data produced during compilation. However, data-driven compiler optimizations based on machine learning models require large sets of quality data for training in order to match or even outperform existing human-crafted heuristics. In static compilation setups, related work has addressed this problem with iterative compilation. However, a dynamic compiler may produce different data depending on dynamically-chosen compilation strategies, which aggravates the generation of comparable data. We propose compilation forking, a technique for generating consistent feature and performance data from arbitrary, dynamically-compiled programs. Different versions of program parts with the same profiling and compilation history are executed within single program runs to minimize noise stemming from dynamic compilation and the runtime environment. Our approach facilitates large-scale performance evaluations of compiler optimization decisions. Additionally, compilation forking supports creating domain-specific compilation strategies based on machine learning by providing the data for model training. We implemented compilation forking in the GraalVM compiler in a programming-language-agnostic way. To assess the quality of the generated data, we trained several machine learning models to replace compiler heuristics for loop-related optimizations. The trained models perform equally well to the highly-tuned compiler heuristics when comparing the geometric means of benchmark suite performances. Larger impacts on few single benchmarks range from speedups of 20% to slowdowns of 17%. The presented approach can be implemented in any dynamic compiler. We believe that it can help to analyze compilation decisions and leverage the use of machine learning into dynamic compilation.
翻译:编译优化决定往往基于手工艺的杂技,以几个既定基准套件为中心。 或者,它们可以从编译过程中生成的特性和性能数据中学习。然而,基于机器学习模型的数据驱动编译优化需要大量的高质量培训数据,以便匹配甚至超过现有的人手制作的杂技。在静态编译组合中,相关工作解决了这个问题,然而,动态编译者可以产生不同的数据,这取决于动态选编程的汇编战略,这加剧了可比数据的生成。我们建议从任意、动态拼凑的程序中编译出一致的特性和性能数据。以任意、动态拼凑的程序为基础,以生成一致的特性和性能数据。在单程序运行中执行不同版本的具有相同特征和编译历史的程式部分,以尽量减少动态编译和运行时产生的噪音。我们的方法有助于对编程优化决定进行大规模性能评估。此外,编译支持在机器学习的基础上,通过为模型培训提供数据来创建特定地理编程战略。我们在GraalVM编译过程中,在编程中将一些经过训练过的系统编辑的系统-语言缩略缩缩的系统对结果,在不断进行编程中学习的系统化的系统化的系统化的系统整理中,我们对结果的系统化的系统化的系统化的精化的精化的精化的精细化的精细化的精制的精制的精制。