We study the hidden-action principal-agent problem in an online setting. In each round, the principal posts a contract that specifies the payment to the agent based on each outcome. The agent then makes a strategic choice of action that maximizes her own utility, but the action is not directly observable by the principal. The principal observes the outcome and receives utility from the agent's choice of action. Based on past observations, the principal dynamically adjusts the contracts with the goal of maximizing her utility. We introduce an online learning algorithm and provide an upper bound on its Stackelberg regret. We show that when the contract space is $[0,1]^m$, the Stackelberg regret is upper bounded by $\widetilde O(\sqrt{m} \cdot T^{1-1/(2m+1)})$, and lower bounded by $\Omega(T^{1-1/(m+2)})$, where $\widetilde O$ omits logarithmic factors. This result shows that exponential-in-$m$ samples are sufficient and necessary to learn a near-optimal contract, resolving an open problem on the hardness of online contract design. Moreover, when contracts are restricted to some subset $\mathcal{F} \subset [0,1]^m$, we define an intrinsic dimension of $\mathcal{F}$ that depends on the covering number of the spherical code in the space and bound the regret in terms of this intrinsic dimension. When $\mathcal{F}$ is the family of linear contracts, we show that the Stackelberg regret grows exactly as $\Theta(T^{2/3})$. The contract design problem is challenging because the utility function is discontinuous. Bounding the discretization error in this setting has been an open problem. In this paper, we identify a limited set of directions in which the utility function is continuous, allowing us to design a new discretization method and bound its error. This approach enables the first upper bound with no restrictions on the contract and action space.
翻译:我们研究了在线设置中的隐藏动作委托代理问题。在每一轮中,委托人发布一份合同,根据每个结果指定支付给代理人的金额。代理人随后进行战略选择,以最大化自身的效用,但该动作不会被直接观察到。委托人观察到结果,并从代理人选择的动作中获得效用。基于过去的观察结果,委托人动态调整合同,以最大化自身的效用。我们引入一种在线学习算法,并提供其Stackelberg遗憾的上限。我们证明当合同空间为$[0,1]^m$时,Stackelberg的遗憾上界为$\widetilde O(\sqrt{m} \cdot T^{1-1/(2m+1)})$,下界为$\Omega(T^{1-1/(m+2)})$,其中$\widetilde O$省略了对数因子。本结果表明指数$-m$个样本足以学习接近于最优合同,解决了在线合同设计的难度问题。此外,当合同限制为某个子集$\mathcal{F} \subset [0,1]^m$时,我们定义了$\mathcal{F}$的内在维度,该维度取决于空间中球面码的覆盖数,并通过此内在维度来约束遗憾。当$\mathcal{F}$为线性合同族时,我们证明Stackelberg遗憾正好增长为$\Theta(T^{2/3})$。合同设计问题之所以具有挑战性,是因为效用函数是不连续的。在此设置中,限制离散化误差一直是一个未解决的问题。在本文中,我们确定了一组方向,其中效用函数是连续的,从而使我们设计了一种新的离散化方法,并界定了其误差。这种方法实现了第一个在没有合同和动作空间限制下的上界。