Dynamic scheduling is an important problem in applications from queuing to wireless networks. It addresses how to choose an item among multiple scheduling items in each timestep to achieve a long-term goal. Conventional approaches for dynamic scheduling find the optimal policy for a given specific system so that the policy from these approaches is usable only for the corresponding system characteristics. Hence, it is hard to use such approaches for a practical system in which system characteristics dynamically change. This paper proposes a novel policy structure for MDP-based dynamic scheduling, a descriptive policy, which has a system-agnostic capability to adapt to unseen system characteristics for an identical task (dynamic scheduling). To this end, the descriptive policy learns a system-agnostic scheduling principle--in a nutshell, "which condition of items should have a higher priority in scheduling". The scheduling principle can be applied to any system so that the descriptive policy learned in one system can be used for another system. Experiments with simple explanatory and realistic application scenarios demonstrate that it enables system-agnostic meta-learning with very little performance degradation compared with the system-specific conventional policies.
翻译:在从排队到无线网络的应用程序中,动态时间安排是一个重要问题。它涉及如何在每一个时间步骤中选择多个列表项目,以实现长期目标。动态时间安排的常规方法为特定的具体系统找到最佳政策,以便这些方法的政策只能用于相应的系统特性。因此,很难将这种方法用于一个系统特征动态变化的实用系统。本文件为基于MDP的动态时间安排提出了一个新的政策结构,即描述性政策,它具有系统-不可知性能力,能够适应一个相同的任务(动态列表)的无形系统特征。为此,描述性政策学习一个系统-不可知性列表原则,“在排队中,哪些项目的条件在排队中应具有更高的优先地位”。 排队原则可以适用于任何系统,以便一个系统中学到的描述性政策能够用于另一个系统。用简单的解释性和现实的应用假想来实验表明,它能够使系统-不可知的元学习与系统特定常规政策相比,绩效退化很小。