Tuning tensor program generation involves searching for various possible program transformation combinations for a given program on target hardware to optimize the tensor program execution. It is already a complex process because of the massive search space and exponential combinations of transformations make auto-tuning tensor program generation more challenging, especially when we have a heterogeneous target. In this research, we attempt to address these problems by learning the joint neural network and hardware features and transferring them to the new target hardware. We extensively study the existing state-of-the-art dataset, TenSet, perform comparative analysis on the test split strategies and propose methodologies to prune the dataset. We adopt an attention-inspired approach for tuning the tensor programs enabling them to embed neural network and hardware-specific features. Our approach could prune the dataset up to 45\% of the baseline without compromising the Pairwise Comparison Accuracy (PCA). Further, the proposed methodology can achieve on-par or improved mean inference time with 25%-40% of the baseline tuning time across different networks and target hardware.
翻译:张量程序生成的调整涉及到为了优化张量程序执行在给定目标硬件上的程序变换组合的不同可能性的搜索。鉴于搜索空间巨大和指数变换组合,自动调优张量程序生成变得更加困难,特别是在目标不同的情况下。本项研究旨在通过学习联合神经网络和硬件特征并将其转移到新的目标硬件来解决这些问题。我们广泛研究了现有的最先进的数据集TenSet,对测试分裂策略进行了比较分析,并提出了修剪数据集的方法。我们采用了一种关注力启发式方法来调整张量程序,使其能够嵌入神经网络和硬件特定特征。我们的方法可以将数据集削减到基线的45%,而不会影响成对比较准确度。此外,所提出的方法可以在不同的网络和目标硬件上实现与基准调整时间的25%-40%的相当或改进的平均推断时间。