Deep generative models parametrized up to a normalizing constant (e.g. energy-based models) are difficult to train by maximizing the likelihood of the data because the likelihood and/or gradients thereof cannot be explicitly or efficiently written down. Score matching is a training method, whereby instead of fitting the likelihood $\log p(x)$ for the training data, we instead fit the score function $\nabla_x \log p(x)$ -- obviating the need to evaluate the partition function. Though this estimator is known to be consistent, its unclear whether (and when) its statistical efficiency is comparable to that of maximum likelihood -- which is known to be (asymptotically) optimal. We initiate this line of inquiry in this paper, and show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated -- i.e. the Poincar\'e, log-Sobolev and isoperimetric constant -- quantities which govern the mixing time of Markov processes like Langevin dynamics. Roughly, we show that the score matching estimator is statistically comparable to the maximum likelihood when the distribution has a small isoperimetric constant. Conversely, if the distribution has a large isoperimetric constant -- even for simple families of distributions like exponential families with rich enough sufficient statistics -- score matching will be substantially less efficient than maximum likelihood. We suitably formalize these results both in the finite sample regime, and in the asymptotic regime. Finally, we identify a direct parallel in the discrete setting, where we connect the statistical properties of pseudolikelihood estimation with approximate tensorization of entropy and the Glauber dynamics.
翻译:深基因模型的毛化是深基因模型, 升至正常常数( 例如, 以能源为基础的模型), 很难通过最大限度地增加数据的可能性来培训, 因为数据的可能性和(或)梯度无法明确或有效地写下来。 分数匹配是一种培训方法, 我们不是匹配用于培训数据的可能性$log p(x) 美元, 而是匹配分数函数$\nabla_x p(x), 而是匹配正数函数 $\ nabla_x p(x) -- 避免了评估分区函数的正确性能。 虽然这个估算器已知是一致的, 但它的统计效率是否( 以及何时) 与最大可能性( 已知) 相比, 它的统计效率是否( 以及何时) 与最大可能性( 已知的) 。 我们在本文中启动这一调查线, 显示得分的统计效率与分布的偏差之间, 即Poincar\e, log- Sobriallev 和isperrial 常数常数常数常数 -- 将比我们更接近于Rangev 动态 。