Performance tuning, software/hardware co-design, and job scheduling are among the many tasks that rely on models to predict application performance. We propose and evaluate low rank tensor decomposition for modeling application performance. We discretize the input and configuration domain of an application using regular grids. Application execution times mapped within grid-cells are averaged and represented by tensor elements. We show that low-rank canonical-polyadic (CP) tensor decomposition is effective in approximating these tensors. We further show that this decomposition enables accurate extrapolation of unobserved regions of an application's parameter space. We then employ tensor completion to optimize a CP decomposition given a sparse set of observed runtimes. We consider alternative piecewise/grid-based models and supervised learning models for six applications and demonstrate that CP decomposition optimized using tensor completion offers higher prediction accuracy and memory-efficiency for high-dimensional applications.
翻译:性能调优、软硬件协同设计以及作业调度等任务都需要模型来预测应用程序的性能。本文提出并评估采用低秩张量分解来建模应用程序性能。我们使用规则网格对应用程序的输入域和配置域进行离散化处理。映射在网格单元内的应用程序执行时间取平均值,并用张量元素表示。我们显示秩为低的CP张量分解在逼近这些张量时是有效的。我们进一步说明这种分解使得未观测到的应用程序参数空间的推断成为可能。我们随后采用张量完成优化CP分解,给定了一组稀疏的观察运行时间。我们对六个应用程序进行了交替的基于片段的模型和基于监督学习的模型,并证明了使用张量完成优化CP分解可为高维应用程序提供更高的预测准确性和记忆效率。