Performance tuning, software/hardware co-design, and job scheduling are among the many tasks that rely on models to predict application performance. We propose and evaluate low rank tensor decomposition for modeling application performance. We use tensors to represent regular grids that discretize the input and configuration domain of an application. Application execution times mapped within grid-cells are averaged and represented by tensor elements. We show that low-rank canonical-polyadic (CP) tensor decomposition is effective in approximating these tensors. We then employ tensor completion to optimize a CP decomposition given a sparse set of observed runtimes. We consider alternative piecewise/grid-based (P/G) and supervised learning models for six applications and demonstrate that P/G models are significantly more accurate relative to model size. Among P/G models, CP decomposition of regular grids (CPR) offers higher accuracy and memory-efficiency, faster optimization, and superior extensibility via user-selected loss functions and domain partitioning. CPR models achieve a 2.18x geometric mean decrease in mean prediction error relative to the most accurate alternative models of size $\le$10 kilobytes.
翻译:性能调整、软件/硬件共同设计以及工作时间安排是依赖模型来预测应用性能的许多任务之一。我们提出并评价用于模拟应用性能的低级别高压分解。我们使用电压代表将一个应用程序的投入和配置领域分离的常规网格。在网格细胞中绘制的应用执行时间是平均的,由高压元素代表。我们显示,低级卡-电极分解在接近这些高压方面是有效的。我们然后利用高压完成优化CP分解,以优化一套稀少的观测运行时间为条件。我们考虑使用基于纸质/电网的替代学习模式(P/G),并监督六个应用程序的学习模式,并表明P/G模型与模型大小相比的精确度要高得多。在P/G模型中,常规电网的分解配置提供了更高的准确性和记忆效率、更快的优化,并通过用户选择的损失功能和域隔断功能实现更高级的扩展性能。CP模型在平均预测误差率方面达到2.18美元,相对于最准确的美元=最高误差。