Optimal Markov Decision Process policies for problems with finite state and action space are identified through a partial ordering by comparing the value function across states. This is referred to as state-based optimality. This paper identifies when such optimality guarantees some form of system-based optimality as measured by a scalar. Four such system-based metrics are introduced. Uni-variate empirical distributions of these metrics are obtained through simulation as to assess whether theoretically optimal policies provide a statistically significant advantage. This has been conducted using a Student's t-test, Welch's $t$-test and a Mann-Whitney $U$-test. The proposed method is applied to a common problem in queuing theory: admission control.
翻译:关于有限状态和行动空间问题的最佳Markov 决策程序政策通过比较各州的值函数,通过部分排序确定,称为基于国家的最佳性。本文件确定这种最佳性何时能保证某种以卡路里测量的基于系统的最佳性。采用了四种基于系统的计量标准。这些计量标准的单变经验分布是通过模拟获得的,通过模拟来评估理论上的最佳政策是否提供了统计上的重大优势。这是使用学生的t-test、Welch's $t-tat-test和Man-Whitney $U$-test的测试进行的。拟议方法适用于排队理论中常见的问题:录入控制。