With the rise of the digital economy and an explosion of available information about consumers, effective personalization of goods and services has become a core business focus for companies to improve revenues and maintain a competitive edge. This paper studies the personalization problem through the lens of policy learning, where the goal is to learn a decision-making rule (a policy) that maps from consumer and product characteristics (features) to recommendations (actions) in order to optimize outcomes (rewards). We focus on using available historical data for offline learning with unknown data collection procedures, where a key challenge is the non-random assignment of recommendations. Moreover, in many business and medical applications, interpretability of a policy is essential. We study the class of policies with linear decision boundaries to ensure interpretability, and propose learning algorithms using tools from causal inference to address unbalanced treatments. We study several optimization schemes to solve the associated non-convex, non-smooth optimization problem, and find that a Bayesian optimization algorithm is effective. We test our algorithm with extensive simulation studies and apply it to an anonymized online marketplace customer purchase dataset, where the learned policy outputs a personalized discount recommendation based on customer and product features in order to maximize gross merchandise value (GMV) for sellers. Our learned policy improves upon the platform's baseline by 88.2\% in net sales revenue, while also providing informative insights on which features are important for the decision-making process. Our findings suggest that our proposed policy learning framework using tools from causal inference and Bayesian optimization provides a promising practical approach to interpretable personalization across a wide range of applications.
翻译:随着数字经济的兴起和关于消费者的现有信息的激增,货物和服务的有效个人化已成为公司改善收入和保持竞争优势的核心商业重点,本文通过政策学习的视角研究个人化问题,政策学习的目的是学习一种决策规则(一种政策),从消费者和产品特点(性能)到建议(行动)绘制地图,以优化结果(奖励),我们注重利用现有的历史数据进行离线学习,采用未知的数据收集程序,其中一项主要挑战是不随机地分配建议。此外,在许多商业和医疗应用中,一项政策的可解释性至关重要。我们研究具有线性决定界限的政策类别,以确保可解释性,并提出学习算法,利用因果推断工具,从消费者和产品特性(通过我们所学的实物价值)到销售价值分析,我们研究一些优化办法,以解决相关的非康韦克斯、非湿润优化问题,发现一种可实现的算法是有效的。我们通过广泛的模拟研究测试我们的算法,并将其应用于一个匿名的网上客户购买数据设置,其中我们学习的政策分类,用个人价值来提供我们所学的客户价值,在销售总收益平台上,通过我们所学的客户价值分析的精度分析。