Tensor program tuning is a non-convex objective optimization problem, to which search-based approaches have proven to be effective. At the core of the search-based approaches lies the design of the cost model. Though deep learning-based cost models perform significantly better than other methods, they still fall short and suffer from the following problems. First, their feature extraction heavily relies on expert-level domain knowledge in hardware architectures. Even so, the extracted features are often unsatisfactory and require separate considerations for CPUs and GPUs. Second, a cost model trained on one hardware platform usually performs poorly on another, a problem we call cross-hardware unavailability. In order to address these problems, we propose TLP and MTLTLP. TLP is a deep learning-based cost model that facilitates tensor program tuning. Instead of extracting features from the tensor program itself, TLP extracts features from the schedule primitives. We treat schedule primitives as tensor languages. TLP is thus a Tensor Language Processing task. In this way, the task of predicting the tensor program latency through the cost model is transformed into a natural language processing (NLP) regression task. MTL-TLP combines Multi-Task Learning and TLP to cope with the cross-hardware unavailability problem. We incorporate these techniques into the Ansor framework and conduct detailed experiments. Results show that TLP can speed up the average search time by 9.1X and 3.0X on CPU and GPU workloads, respectively, compared to the state-of-the-art implementation. MTL-TLP can achieve a speed-up of 4.7X and 2.9X on CPU and GPU workloads, respectively, using only 7% of the target hardware data.
翻译:线性程序调制是一个非convex 目标优化问题, 基于搜索的方法已证明是有效的。 在基于搜索的方法的核心是成本模型的设计。 在基于搜索的方法的核心是成本模型的设计。 尽管深层次基于学习的成本模型比其他方法表现得要好得多, 但是它们仍然落后, 并遭受了以下问题。 首先, 它们的特点提取在很大程度上依赖于硬件结构中专家级域知识。 即使如此, 提取的功能往往不令人满意, 需要分别考虑 CPU 和 GPP 。 其次, 一个硬件平台上培训的成本模型通常表现不佳, 一个我们称之为交叉软件无法使用的问题。 为了解决这些问题, 我们建议 TLP 和 MTLP 。 TLP 是一个深层次基于学习的成本模型, 方便了阵列程序调整。 TLP 将表的原始功能作为沙尔语系语言处理。 因此, Tensor PP 是一个语言处理任务。 以这个方式, 预测Tensor polf 程序在Tral- Ploral Ploral 中完成 3 和MT IML 的单个操作, 将 和多式L 的L 演示数据化数据转换成一个自然处理。