We study dynamic discrete choice models, where a commonly studied problem involves estimating parameters of agent reward functions (also known as "structural" parameters), using agent behavioral data. Maximum likelihood estimation for such models requires dynamic programming, which is limited by the curse of dimensionality. In this work, we present a novel algorithm that provides a data-driven method for selecting and aggregating states, which lowers the computational and sample complexity of estimation. Our method works in two stages. In the first stage, we use a flexible inverse reinforcement learning approach to estimate agent Q-functions. We use these estimated Q-functions, along with a clustering algorithm, to select a subset of states that are the most pivotal for driving changes in Q-functions. In the second stage, with these selected "aggregated" states, we conduct maximum likelihood estimation using a commonly used nested fixed-point algorithm. The proposed two-stage approach mitigates the curse of dimensionality by reducing the problem dimension. Theoretically, we derive finite-sample bounds on the associated estimation error, which also characterize the trade-off of computational complexity, estimation error, and sample complexity. We demonstrate the empirical performance of the algorithm in two classic dynamic discrete choice estimation applications.
翻译:我们研究动态离散选择模型,其中一个常见的问题涉及利用代理行为数据来估计代理回报函数的参数(也称为“结构”参数)。对于这样的模型,最大似然估计需要动态规划,但这受到维度灾难的限制。在本文中,我们提出了一种新颖的算法,提供了一种数据驱动的方法来选择和聚合状态,降低了估计的计算和样本复杂性。我们的方法分为两个阶段。第一阶段,我们使用灵活的反向强化学习方法来估计代理 Q 函数。我们利用这些估计的 Q 函数,以及聚类算法,选择了一组对驱动 Q 函数变化最关键的状态。在第二阶段,利用这些选择的“聚合”状态,我们使用常用的巢式固定点算法进行最大似然估计。所提出的两阶段方法通过降低问题维度来缓解维度灾难。理论上,我们推导了与估计误差相关的有限样本界限,这也表征了计算复杂度、估计误差和样本复杂度的权衡。我们在两个经典的动态离散选择估计应用中展示了该算法的实证性能。