Performance optimization is an increasingly challenging but often repetitive task. While each platform has its quirks, the underlying code transformations rely on data movement and computational characteristics that recur across applications. This paper proposes to leverage those similarities by constructing an embedding space for subprograms. The continuous space captures both static and dynamic properties of loop nests via symbolic code analysis and performance profiling, respectively. Performance embeddings enable direct knowledge transfer of performance tuning between applications, which can result from autotuning or tailored improvements. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils. Transfer tuning reduces the search complexity by up to four orders of magnitude and outperforms the MKL library in sparse-dense matrix multiplication. The results exhibit clear correspondences between program characteristics and optimizations, outperforming prior specialized state-of-the-art approaches and generalizing beyond their capabilities.
翻译:优化性能是一项日益具有挑战性但往往重复的任务。 每个平台都有其怪异之处, 基本代码转换则依赖于数据流动和跨应用循环的计算特点。 本文提议通过为子程序构建嵌入空间来利用这些相似之处。 连续的空间分别通过象征性代码分析和性能剖析来捕捉环形巢的静态和动态特性。 嵌入功能使得能够直接知识传输应用之间的性能调适,这可以通过自动化或量身定制改进来实现。 我们展示了在深神经网络、稠密和稀薄的线性代数组成和数字天气预报的案例研究中的这种调试方法。 调换使搜索复杂性降低到四个数量级, 并超越了 MKL 库的微量度矩阵乘法。 结果显示了程序特性和优化之间的明确对应关系, 超过了先前的专业状态方法,超越了它们的能力范围。</s>