Non-deterministic measurements are common in real-world scenarios: the performance of a stochastic optimization algorithm or the total reward of a reinforcement learning agent in a chaotic environment are just two examples in which unpredictable outcomes are common. These measures can be modeled as random variables and compared among each other via their expected values or more sophisticated tools such as null hypothesis statistical tests. In this paper, we propose an alternative framework to visually compare two samples according to their estimated cumulative distribution functions. First, we introduce a dominance measure for two random variables that quantifies the proportion in which the cumulative distribution function of one of the random variables stochastically dominates the other one. Then, we present a graphical method that decomposes in quantiles i) the proposed dominance measure and ii) the probability that one of the random variables takes lower values than the other. With illustrative purposes, we re-evaluate the experimentation of an already published work with the proposed methodology and we show that additional conclusions (missed by the rest of the methods) can be inferred. Additionally, the software package RVCompare was created as a convenient way of applying and experimenting with the proposed framework.
翻译:非决定性的测量方法在现实世界情景中很常见:在混乱的环境中,执行随机优化算法或奖励强化学习剂的总报酬只是不可预测的结果常见的两个例子。这些措施可以作为随机变数模型,并通过预期值或诸如无效假设统计测试等更先进的工具相互比较。在本文中,我们提议了一个替代框架,以便根据两种样本的估计累积分布功能对两种样本进行视觉比较。首先,我们为两种随机变量引入一种支配性测量,以量化随机变数之一的累积分布功能在另一个变数中主宰另一个变数的比例。然后,我们提出一种图形方法,在孔状中解密(i)拟议的主导度测量和(ii)随机变数之一的数值比其他变数低的可能性。为了说明起见,我们重新评价已经公布的与拟议方法的工作的实验,我们证明可以推断出额外结论(因方法的其余部分而遗漏 ) 。此外, RVCompare软件包是作为方便应用和试验拟议框架的一种方法而创建的。