PASTO:建议系统中的战略参数优化 -- -- 概率优于确定性 (PASTO: Strategic Parameter Optimization in Recommendation Systems -- Probabilistic is Better than Deterministic)

Weicong Ding,Hanlin Tang,Jingshuo Feng,Lei Yuan,Sen Yang,Guangxu Yang,Jie Zheng,Jing Wang,Qiang Su,Dong Zheng,Xuezhong Qiu,Yongqi Liu,Yuxuan Chen,Yang Liu,Chao Song,Dongying Kong,Kai Ren,Peng Jiang,Qiao Lian,Ji Liu

Real-world recommendation systems often consist of two phases. In the first phase, multiple predictive models produce the probability of different immediate user actions. In the second phase, these predictions are aggregated according to a set of 'strategic parameters' to meet a diverse set of business goals, such as longer user engagement, higher revenue potential, or more community/network interactions. In addition to building accurate predictive models, it is also crucial to optimize this set of 'strategic parameters' so that primary goals are optimized while secondary guardrails are not hurt. In this setting with multiple and constrained goals, this paper discovers that a probabilistic strategic parameter regime can achieve better value compared to the standard regime of finding a single deterministic parameter. The new probabilistic regime is to learn the best distribution over strategic parameter choices and sample one strategic parameter from the distribution when each user visits the platform. To pursue the optimal probabilistic solution, we formulate the problem into a stochastic compositional optimization problem, in which the unbiased stochastic gradient is unavailable. Our approach is applied in a popular social network platform with hundreds of millions of daily users and achieves +0.22% lift of user engagement in a recommendation task and +1.7% lift in revenue in an advertising optimization scenario comparing to using the best deterministic parameter strategy.

翻译：现实世界建议系统通常由两个阶段组成。在第一阶段, 多个预测模型产生不同用户立即行动的概率。在第二阶段, 这些预测是根据一套“ 战略参数” 来汇总的, 以满足一系列不同的业务目标, 如用户参与时间更长、收入潜力更高、社区/网络互动更多等。除了建立准确的预测模型, 优化这组“ 战略参数” 也至关重要, 以便优化这组“ 战略参数 ”, 使初级目标得到优化, 而次要护护栏不受伤害。在这种具有多重和受限目标的环境下, 本文发现, 与寻找单一确定参数的标准制度相比, 概率战略参数制度可以实现更好的价值。新的概率制度是学习战略参数选择的最佳分布, 并在每个用户访问平台时从分布中抽取一个战略参数样本。为了追求最佳的概率解决方案, 我们将问题发展成一个随机化的构成优化优化问题, 在其中, 无法找到不均匀的随机梯度梯度梯度梯度梯度。我们的方法应用在一个广受欢迎的社会网络平台上, 与数以亿的每日用户用户为数以百万计,, 实现最佳升级的升级的用户参与度战略, 和升级的升级的升级的升级的版本。