Estimation of the complete distribution of a random variable is a useful primitive for both manual and automated decision making. This problem has received extensive attention in the i.i.d. setting, but the arbitrary data dependent setting remains largely unaddressed. Consistent with known impossibility results, we present computationally felicitous time-uniform and value-uniform bounds on the CDF of the running averaged conditional distribution of a real-valued random variable which are always valid and sometimes trivial, along with an instance-dependent convergence guarantee. The importance-weighted extension is appropriate for estimating complete counterfactual distributions of rewards given controlled experimentation data exhaust, e.g., from an A/B test or a contextual bandit.
翻译:随机变量的完整分布估计对于手工和自动决策都是一个有用的原始数据。这个问题在i.d.设置中得到了广泛的关注,但任意的数据依赖设置在很大程度上仍未得到解决。与已知的不可能的结果一样,我们向综合发展框架提出一个实际价值随机变量平均有条件分布的计算成功的时间-统一和价值-统一界限,该参数的运行条件总是有效的,有时是微不足道的,同时提供基于实例的趋同保证。重要性加权扩展对于估计奖励的完全反事实分布是合适的,因为有控制的实验数据已经耗尽,例如A/B测试或背景的缩放数据已经耗尽。</s>