We consider a personalized pricing problem in which we have data consisting of feature information, historical pricing decisions, and binary realized demand. The goal is to perform off-policy evaluation for a new personalized pricing policy that maps features to prices. Methods based on inverse propensity weighting (including doubly robust methods) for off-policy evaluation may perform poorly when the logging policy has little exploration or is deterministic, which is common in pricing applications. Building on the balanced policy evaluation framework of Kallus (2018), we propose a new approach tailored to pricing applications. The key idea is to compute an estimate that minimizes the worst-case mean squared error or maximizes a worst-case lower bound on policy performance, where in both cases the worst-case is taken with respect to a set of possible revenue functions. We establish theoretical convergence guarantees and empirically demonstrate the advantage of our approach using a real-world pricing dataset.
翻译:我们认为,一个个人化定价问题,即我们拥有由特征信息、历史定价决定和二元已实现需求组成的数据;目标是对新的个人化定价政策进行非政策性评价,该政策将标出价格的特征; 以反向倾向加权(包括双重稳健方法)为基础的非政策性评价方法可能效果不佳,因为伐木政策很少探索,或具有确定性,这在定价应用中是常见的; 根据卡卢斯(2018年)的平衡政策评价框架,我们提出了适合定价应用的新方法; 关键的想法是计算一个估计数,以尽量减少最坏情况的平均正方形错误,或最大限度地减少政策业绩的最坏情况; 在这两种情况下,最坏情况是在一套可能的税收功能方面采取最坏情况; 我们建立理论趋同保证,并用实际世界定价数据集从经验上证明我们做法的优势。</s>