Models with a large number of latent variables are often used to fully utilize the information in big or complex data. However, they can be difficult to estimate using standard approaches, and variational inference methods are a popular alternative. Key to the success of these is the selection of an approximation to the target density that is accurate, tractable and fast to calibrate using optimization methods. Most existing choices can be inaccurate or slow to calibrate when there are many latent variables. Instead, we propose a family of tractable variational approximations that are more accurate and faster to calibrate for this case. It combines a parsimonious parametric approximation for the parameter posterior, with the exact conditional posterior of the latent variables. We derive a simplified expression for the re-parameterization gradient of the variational lower bound, which is the main ingredient of efficient optimization algorithms used to implement variational estimation. To do so only requires the ability to generate exactly or approximately from the conditional posterior of the latent variables, rather than to compute its density. We illustrate using two complex contemporary econometric examples. The first is a nonlinear multivariate state space model for U.S. macroeconomic variables. The second is a random coefficients tobit model applied to two million sales by 20,000 individuals from a large marketing panel. In both cases, we show that our approximating family is considerably more accurate than mean field or structured Gaussian approximations, and faster than Markov chain Monte Carlo. Last, we show how to implement data sub-sampling in variational inference for our approximation, which can lead to a further reduction in computation time.
翻译:含有大量潜在变量的模型往往被用来充分利用大或复杂数据中的信息。 但是, 使用标准方法很难估计, 变推法是一种受欢迎的替代方法。 成功的关键是选择精确、 可移动和快速校准的目标密度近似值, 使用优化方法。 大多数现有选择可能不准确, 或当存在许多潜在变量时校准速度较慢。 相反, 我们建议一组可移动的变近近值, 更准确和更快地校准这个案例。 它可能很难使用标准方法进行估计, 变化推导法是一个受欢迎的替代方法。 成功的关键是选择精确、 可移动和快速地校准目标密度的目标密度的近似近似值。 成功的关键是选择一个精确的参数远近似近似近似近似值的近似近似值, 以及隐隐隐变量的精确近似值。 我们从一个非线性多变差的多变差率度梯度度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度, 显示我们从两度阵列的运算到20, 度梯度阵列的时差阵列的运算模型, 显示我们从两度阵列的运算到20, 。