To efficiently perform inference with neural networks, the underlying tensor programs require sufficient tuning efforts before being deployed into production environments. Usually, enormous tensor program candidates need to be sufficiently explored to find the one with the best performance. This is necessary to make the neural network products meet the high demand of real-world applications such as natural language processing, auto-driving, etc. Auto-schedulers are being developed to avoid the need for human intervention. However, due to the gigantic search space and lack of intelligent search guidance, current auto-schedulers require hours to days of tuning time to find the best-performing tensor program for the entire neural network. In this paper, we propose HARL, a reinforcement learning (RL) based auto-scheduler specifically designed for efficient tensor program exploration. HARL uses a hierarchical RL architecture in which learning-based decisions are made at all different levels of search granularity. It also automatically adjusts exploration configurations in real-time for faster performance convergence. As a result, HARL improves the tensor operator performance by 22% and the search speed by 4.3x compared to the state-of-the-art auto-scheduler. Inference performance and search speed are also significantly improved on end-to-end neural networks.
翻译:为了在神经网络中高效地进行推断,原始的电压程序需要在投入生产环境之前进行足够的调试。通常,需要充分探索巨大的电压程序候选人,才能找到最优秀的气压程序。这对于使神经网络产品满足自然语言处理、自动驾驶等现实世界应用的高度需求是必要的。正在开发自动调度器,以避免人类干预的需要。然而,由于搜索空间巨大,缺乏智能搜索指导,目前的自动调度器需要几个小时到几天的时间来调试,才能找到整个神经网络最优秀的气压程序。在本文件中,我们提议HARL,一个基于强化学习(RL)的自动调度器,专门设计用于高效的电压程序探索。HARL使用一个等级的RL结构,在搜索颗粒度的所有不同级别上作出基于学习的决定。它还自动调整实时的勘探配置,以加快性能趋同速度。因此,HARL将调频操作器操作器的性能提高22 %,而搜索速度则由4.3x的搜索速度大大改进。