In high performance domains like image processing, physics simulation or machine learning, program performance is critical. Programmers called performance engineers are responsible for the challenging task of optimising programs. Two major challenges prevent modern compilers targeting heterogeneous architectures from reliably automating optimisation. First, domain-specific compilers such as Halide for image processing and TVM for machine learning are difficult to extend with the new optimisations required by new algorithms and hardware. Second, automatic optimisation is often unable to achieve the required performance, and performance engineers often fall back to painstaking manual optimisation. This thesis shows the potential of the Shine compiler to achieve domain-extensibility, controllable automation, and generate high performance code. Domain-extensibility facilitates adapting compilers to new algorithms and hardware. Controllable automation enables performance engineers to gradually take control of the optimisation process. The first research contribution is to add 3 code generation features to Shine, namely: synchronisation barrier insertion, kernel execution, and storage folding. The second research contribution is to demonstrate how extensibility and controllability are exploited to optimise a standard image processing pipeline for corner detection. The final research contribution is to introduce sketch-guided equality saturation, a semi-automated technique that allows performance engineers to guide program rewriting by specifying rewrite goals as sketches: program patterns that leave details unspecified.
翻译:在高性能领域,如图像处理、物理模拟或机器学习,程序绩效至关重要。程序设计师称为性能工程师,负责优化程序这一具有挑战性的任务。两大挑战使针对不同结构的现代编译员无法可靠地实现优化的自动化。首先,用于图像处理的Halide和用于机器学习的TVM等特定领域的编译员难以随着新算法和硬件要求的新优化而扩展。第二,自动优化往往无法达到所要求的性能,而性能工程师往往会回到艰苦的手工优化。该论文展示了Shine编译员实现域扩展、可控制自动化和生成高性性能代码的潜力。多功能推广性有助于使编译员适应新的算法和硬件。可控制性自动化使性工程师能够逐渐控制优化进程。第一个研究贡献是给Shinawe增加3个代码生成功能,即同步障碍插入、内心执行和存储器化工程师,第二个研究贡献是展示Shine Controducle 的可扩展性和控制性和控制性,这是将最佳性、可控制性、可控制性、可控制性、可控制性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、可操作性、