Non-deterministic measurements are common in real-world scenarios: the performance of a stochastic optimization algorithm or the total reward of a reinforcement learning agent in a chaotic environment are just two examples in which unpredictable outcomes are common. These measures can be modeled as random variables and compared among each other via their expected values or more sophisticated tools such as null hypothesis statistical tests. In this paper, we propose an alternative framework to visually compare two samples according to their estimated cumulative distribution functions. First, we introduce a dominance measure for two random variables that quantifies the proportion in which the cumulative distribution function of one of the random variables scholastically dominates the other one. Then, we present a graphical method that decomposes in quantiles i) the proposed dominance measure and ii) the probability that one of the random variables takes lower values than the other. With illustrative purposes, we re-evaluate the experimentation of an already published work with the proposed methodology and we show that additional conclusions (missed by the rest of the methods) can be inferred. Additionally, the software package RVCompare was created as a convenient way of applying and experimenting with the proposed framework.
翻译:非决定性的测量方法在现实世界情景中很常见:在混乱的环境中,执行随机优化算法或加固学习剂的总报酬只是两个常见的不可预测结果的例子。这些措施可以作为随机变数模型,并通过预期值或诸如无效假设统计测试等更先进的工具相互比较。在本文中,我们建议了一个替代框架,以便根据两种样本的估计累积分布功能对两种样本进行视觉比较。首先,我们为两种随机变量引入一个随机变数的累积分布功能在另一个变数中占多数的比例进行量化。然后,我们提出一种图形方法,将一个随机变数分解成孔体(i),拟议的主导度计量办法和(ii)随机变数之一的数值比其他变数低的可能性。为了说明性的目的,我们重新评价已经公布的与拟议方法的工作的实验,我们证明可以推断出额外结论(因方法的其余部分而遗漏 ) 。此外,RVCompare软件包是用来方便地应用和试验拟议框架的。