Many types of data from fields including natural language processing, computer vision, and bioinformatics, are well represented by discrete, compositional structures such as trees, sequences, or matchings. Latent structure models are a powerful tool for learning to extract such representations, offering a way to incorporate structural bias, discover insight about the data, and interpret decisions. However, effective training is challenging, as neural networks are typically designed for continuous computation. This text explores three broad strategies for learning with discrete latent structure: continuous relaxation, surrogate gradients, and probabilistic estimation. Our presentation relies on consistent notations for a wide range of models. As such, we reveal many new connections between latent structure learning strategies, showing how most consist of the same small set of fundamental building blocks, but use them differently, leading to substantially different applicability and properties.
翻译:来自自然语言处理、计算机视觉和生物信息学等领域的许多类型的数据,都由树、序列或匹配等离散的构成结构、结构结构或配对等很好地代表。留端结构模型是学习提取这种表达方式的有力工具,提供了纳入结构偏差、发现数据洞察力和解释决定的方法。然而,有效的培训具有挑战性,因为神经网络通常是为连续计算而设计的。本文本探讨了使用离散潜伏结构学习的三大战略:持续放松、代位梯度和概率估计。我们的陈述依赖于对广泛模型的一致的标记。因此,我们揭示了各种潜在结构学习战略之间的许多新联系,表明其中多数是相同的一小组基本构件,但使用方式不同,导致显著不同的适用性和特性。