We study a two-player Stackelberg game with incomplete information such that the follower's strategy belongs to a known family of parameterized functions with an unknown parameter vector. We design an adaptive learning approach to simultaneously estimate the unknown parameter and minimize the leader's cost, based on adaptive control techniques and hysteresis switching. Our approach guarantees that the leader's cost predicted using the parameter estimate becomes indistinguishable from its actual cost in finite time, up to a preselected, arbitrarily small error threshold. Also, the first-order necessary condition for optimality holds asymptotically for the predicted cost. Additionally, if a persistent excitation condition holds, then the parameter estimation error becomes bounded by a preselected, arbitrarily small threshold in finite time as well. For the case where there is a mismatch between the follower's strategy and the parameterized function that is known to the leader, our approach is able to guarantee the same convergence results for error thresholds larger than the size of the mismatch. The algorithms and the convergence results are illustrated via a simulation example in the domain of network security.
翻译:我们研究的是双玩家Stackelberg游戏, 其信息不完整, 以至于跟踪者的战略属于已知的参数化函数大家庭, 其参数矢量未知。 我们设计了适应性学习方法, 以适应性控制技术和歇斯底里转换为基础, 同时估计未知参数并尽量减少领导者的成本。 我们的方法保证, 使用参数估计的领头人的成本在有限的时间内与实际成本无法区分, 直至预选的、 任意的小错误阈值。 此外, 最佳性的第一阶必备条件对于预测成本来说是微不足道的。 此外, 如果持续引用条件维持不变, 那么参数估计错误就会在有限的时间里被预选的、 任意的小阈值所约束。 如果追随者的战略与领头所知道的参数函数不匹配, 我们的方法能够保证, 错误阈值的趋同结果比不匹配的大小。 算法和趋同结果通过网络安全域的模拟示例加以说明。