Real-time bidding (RTB) has become a major paradigm of display advertising. Each ad impression generated from a user visit is auctioned in real time, where demand-side platform (DSP) automatically provides bid price usually relying on the ad impression value estimation and the optimal bid price determination. However, the current bid strategy overlooks large randomness of the user behaviors (e.g., click) and the cost uncertainty caused by the auction competition. In this work, we explicitly factor in the uncertainty of estimated ad impression values and model the risk preference of a DSP under a specific state and market environment via a sequential decision process. Specifically, we propose a novel adaptive risk-aware bidding algorithm with budget constraint via reinforcement learning, which is the first to simultaneously consider estimation uncertainty and the dynamic risk tendency of a DSP. We theoretically unveil the intrinsic relation between the uncertainty and the risk tendency based on value at risk (VaR). Consequently, we propose two instantiations to model risk tendency, including an expert knowledge-based formulation embracing three essential properties and an adaptive learning method based on self-supervised reinforcement learning. We conduct extensive experiments on public datasets and show that the proposed framework outperforms state-of-the-art methods in practical settings.
翻译:实时招标(RTB)已成为展示广告的主要范例。用户访问产生的每个印象都是实时拍卖的,需求方平台(DSP)自动提供投标价格,通常依赖印象价值估计和最佳投标价格确定;然而,目前的投标战略忽略了用户行为(例如点击)和拍卖竞争造成的成本不确定性的巨大随机性。在这项工作中,我们明确考虑到估计印象值的不确定性,并用一个顺序决策过程,在特定国家和市场环境中,将DSP的风险偏好作为模型。具体地说,我们提出一种新的适应性风险意识招标算法,通过强化学习来限制预算,这是首先同时考虑预测不确定性和DSP的动态风险趋势。我们从理论上揭示了不确定性与基于风险价值的风险趋势(VaR)之间的内在关系。因此,我们建议对模式风险趋势进行两次即时钟,包括一种基于知识的专家配置,包含三种基本属性,以及一种基于自我监督强化学习的适应性学习方法。我们在公共数据集上进行了广泛的实验,并展示了拟议框架的外形形结构。