We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf. Such a problem is modeled as a Stackelberg game between the principal and the agent, where the principal announces a scoring rule that specifies the payment, and then the agent then chooses an effort level that maximizes her own profit and reports the information. We study the online setting of such a problem from the principal's perspective, i.e., designing the optimal scoring rule by repeatedly interacting with the strategic agent. We design a provably sample efficient algorithm that tailors the UCB algorithm (Auer et al., 2002) to our model, which achieves a sublinear $T^{2/3}$-regret after $T$ iterations. Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized. Furthermore, a key feature of our regret bound is that it is independent of the number of states of the environment.
翻译:我们研究有激励的信息获取问题,即委托人雇用一名代理人代表她收集信息。这样一个问题以委托人和代理人之间的Stackelberg游戏为模范,由委托人宣布一个具体规定付款的评分规则,然后代理人选择一个使自己利润最大化的努力水平,并报告信息。我们从委托人的角度研究这一问题的在线设置,即通过与战略代理人反复互动来设计最佳评分规则。我们设计了一个精巧的抽样有效算法,将UCB算法(Auer等人,2002年)与我们的模型进行裁剪裁剪,该算法将达到亚线值$T<unk> 2/3美元($+2/3)-gret)在美元重复后达到一个亚线值。我们的算法为本金的最佳利润设定了一个微妙的估计程序,并且有一个保守的纠正计划,确保委托人的行动受到激励。此外,我们遗憾的关键特征是,它独立于环境状况的数量。</s>