We study dynamic discrete choice models, where a commonly studied problem involves estimating parameters of agent reward functions (also known as "structural" parameters), using agent behavioral data. Maximum likelihood estimation for such models requires dynamic programming, which is limited by the curse of dimensionality. In this work, we present a novel algorithm that provides a data-driven method for selecting and aggregating states, which lowers the computational and sample complexity of estimation. Our method works in two stages. In the first stage, we use a flexible inverse reinforcement learning approach to estimate agent Q-functions. We use these estimated Q-functions, along with a clustering algorithm, to select a subset of states that are the most pivotal for driving changes in Q-functions. In the second stage, with these selected "aggregated" states, we conduct maximum likelihood estimation using a commonly used nested fixed-point algorithm. The proposed two-stage approach mitigates the curse of dimensionality by reducing the problem dimension. Theoretically, we derive finite-sample bounds on the associated estimation error, which also characterize the trade-off of computational complexity, estimation error, and sample complexity. We demonstrate the empirical performance of the algorithm in two classic dynamic discrete choice estimation applications.
翻译:我们研究了动态离散选择模型,在这些模型中,常见的问题是使用代理的行为数据來估计代理奖励函数的参数(也称为“结构性”参数)。最大似然估计需要动态规划,受到维度灾难的限制。在这项工作中,我们提出了一种新颖的算法,提供了一种数据驱动的方法来选择和聚合状态,从而降低了估计的计算和样本复杂性。我们的方法分两个阶段。在第一阶段,我们使用一种灵活的逆强化学习方法来估计代理Q函数。我们使用这些估计的Q函数,加上聚类算法,来选择一组最为关键的状态,这些状态对于驱动Q函数的变化至关重要。在第二阶段中,针对这些选择的“聚合”状态,我们使用常用的嵌套固定点算法进行最大似然估计。所提出的两阶段方法通过降低问题的维数来缓解了维度灾难。理论上,我们推导出了相关估计误差的有限样本界限,同时表征了计算复杂度、估计误差和样本复杂度之间的权衡。我们展示了该算法在两个经典的动态离散选择估计应用中的实证表现。