In online marketing, the advertisers' goal is usually a tradeoff between achieving high volumes and high profitability. The companies' business units customarily address this tradeoff by maximizing the volumes while guaranteeing a lower bound to the Return On Investment (ROI). This paper investigates combinatorial bandit algorithms for the bid optimization of advertising campaigns subject to uncertain budget and ROI constraints. We study the nature of both the optimization and learning problems. In particular, when focusing on the optimization problem without uncertainty, we show that it is inapproximable within any factor unless P=NP, and we provide a pseudo-polynomial-time algorithm that achieves an optimal solution. When considering uncertainty, we prove that no online learning algorithm can violate the (ROI or budget) constraints during the learning process a sublinear number of times while guaranteeing a sublinear pseudo-regret. Thus, we provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations. We also design its safe version, namely GCB_{safe}, guaranteeing w.h.p. a constant upper bound on the number of constraints violations at the cost of a linear pseudo-regret. More interestingly, we provide an algorithm, namely GCB_{safe}(\psi,\phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances \psi and \phi in the satisfaction of the ROI and budget constraints, respectively. This algorithm actually mitigates the risks due to the constraints violations without precluding the convergence to the optimal solution. Finally, we experimentally compare our algorithms in terms of pseudo-regret/constraint-violation tradeoff in settings generated from real-world data, showing the importance of adopting safety constraints in practice and the effectiveness of our algorithms.
翻译:在网上营销中,广告商的目标通常是实现高额和高利润之间的权衡。公司的商业单位通常通过最大限度地增加数量来弥补这一权衡,同时保证与“回投投资”(ROI)的关联性较低。本文调查了在预算不确定和ROI限制条件下优化广告运动投标的组合式强盗算法。我们研究了优化和学习问题的性质。特别是,当侧重于优化问题而没有不确定性时,我们表明,在任何因素中,它都是不能相近的,除非P=NP,而且我们提供一种假极价时间算法,从而实现最佳的解决方案。在考虑不确定性时,我们证明在学习过程中没有任何在线学习算法(ROI或预算)的制约性(ROI或预算)子直线数,同时保证亚线性伪正价伪回报。因此,我们提供了一种算法,即GCB保证亚线性对潜在限制的代价的遗憾。我们还设计了它的到期版本,即 GCB*safefefefe, 保证Wh.p.