Despite its wide use, recent studies have revealed unexpected and undesirable properties of neural autoregressive sequence models trained with maximum likelihood, such as an unreasonably high affinity to short sequences after training and to infinitely long sequences at decoding time. We propose to study these phenomena by investigating how the modes, or local maxima, of a distribution are maintained throughout the full learning chain of the ground-truth, empirical, learned and decoding-induced distributions, via the newly proposed mode recovery cost. We design a tractable testbed where we build three types of ground-truth distributions: (1) an LSTM based structured distribution, (2) an unstructured distribution where probability of a sequence does not depend on its content, and (3) a product of these two which we call a semi-structured distribution. Our study reveals both expected and unexpected findings. First, starting with data collection, mode recovery cost strongly relies on the ground-truth distribution and is most costly with the semi-structured distribution. Second, after learning, mode recovery cost from the ground-truth distribution may increase or decrease compared to data collection, with the largest cost degradation occurring with the semi-structured ground-truth distribution. Finally, the ability of the decoding-induced distribution to recover modes from the learned distribution is highly impacted by the choices made earlier in the learning chain. We conclude that future research must consider the entire learning chain in order to fully understand the potentials and perils and to further improve neural autoregressive sequence models.
翻译:尽管使用范围很广,但最近的研究揭示了神经自动递减序列模型的意外和不良特性,这些神经自动递减序列模型尽管使用范围很广,但发现这些模型的意外和不良特性,例如,在培训后对短顺序的不合理高度亲近性,在解码时间对极长序列的无限长顺序。我们提议研究这些现象,调查如何在地面真相、经验、学习和解码引发的分布的整个学习链中保持这种分布模式或地方角值。首先,从数据收集、经验、解码和解码引发的分布,通过新提出的模式回收成本,我们设计了一个可移植的测试台,我们在那里建造了三种类型的地面图解析分布类型:(1) 以LSTM为基础的结构分布;(2) 一种不结构化的分布顺序,其可能性并不取决于其内容;以及(3) 这两种序列的产值,即我们称之为半结构分布的半结构。我们的研究揭示了预期和意外的结果。首先,模式的恢复成本主要依赖于地面图解分布,我们理解的分布最昂贵。 其次,从地面图解分布的回收成本成本成本成本成本可能增加或整个与数据收集相比,最后的分布,最后通过学习到深层分析,最终的降解的降解的降解的分布,必须研究到最终的降解的降解的降解的分布,研究到最终的降解到最后的降解到最后的降解到学习的降解到最后的降解到累进的分布,最后的降解到最后的降解到最后的降解到最后的降解到最后的降解到深的分布。