While end-to-end learning with fully differentiable models has enabled tremendous success in natural language process (NLP) and machine learning, there have been significant recent interests in learning with latent discrete structures to incorporate better inductive biases for improved end-task performance and better interpretability. This paradigm, however, is not straightforwardly amenable to the mainstream gradient-based optimization methods. This work surveys three main families of methods to learn such models: surrogate gradients, continuous relaxation, and marginal likelihood maximization via sampling. We conclude with a review of applications of these methods and an inspection of the learned latent structure that they induce.
翻译:虽然与完全不同的模型的端到端学习使自然语言过程和机器学习取得了巨大成功,但最近对与潜伏的离散结构学习的兴趣很大,以纳入更好的感性偏差来改进最终任务性能和更好的解释性,但这一模式并不直接适合主流梯度优化方法。本工作调查了学习这些模型的三个主要方法:代用梯度、持续放松和通过抽样实现最小可能性最大化。我们最后审查了这些方法的应用情况,并检查了这些方法所引发的已知潜在结构。