Deep NLP models benefit from underlying structures in the data---e.g., parse trees---typically extracted using off-the-shelf parsers. Recent attempts to jointly learn the latent structure encounter a tradeoff: either make factorization assumptions that limit expressiveness, or sacrifice end-to-end differentiability. Using the recently proposed SparseMAP inference, which retrieves a sparse distribution over latent structures, we propose a novel approach for end-to-end learning of latent structure predictors jointly with a downstream predictor. To the best of our knowledge, our method is the first to enable unrestricted dynamic computation graph construction from the global latent structure, while maintaining differentiability.
翻译:深层 NLP 模型受益于数据中的基本结构 -- -- 例如,用现成的剖析树—— 典型地利用现成的采样器进行提取。最近联合学习潜在结构的尝试遇到了一个权衡:要么作出限制直观度的乘数假设,要么牺牲端到端的差异性。利用最近提议的SparseMAP推论,在潜伏结构上找到微小的分布,我们提出了一个与下游预测器共同从端到端学习潜在结构预测器的新办法。据我们所知,我们的方法首先能够从全球潜伏结构中进行不受限制的动态计算图的构造,同时保持差异性。