指针网络 (Pointer Networks)

We introduce a new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence. Such problems cannot be trivially addressed by existent approaches such as sequence-to-sequence and Neural Turing Machines, because the number of target classes in each step of the output depends on the length of the input, which is variable. Problems such as sorting variable sized sequences, and various combinatorial optimization problems belong to this class. Our model solves the problem of variable size output dictionaries using a recently proposed mechanism of neural attention. It differs from the previous attention attempts in that, instead of using attention to blend hidden units of an encoder to a context vector at each decoder step, it uses attention as a pointer to select a member of the input sequence as the output. We call this architecture a Pointer Net (Ptr-Net). We show Ptr-Nets can be used to learn approximate solutions to three challenging geometric problems -- finding planar convex hulls, computing Delaunay triangulations, and the planar Travelling Salesman Problem -- using training examples alone. Ptr-Nets not only improve over sequence-to-sequence with input attention, but also allow us to generalize to variable size output dictionaries. We show that the learnt models generalize beyond the maximum lengths they were trained on. We hope our results on these tasks will encourage a broader exploration of neural learning for discrete problems.

翻译：我们引入一个新的神经结构以学习输出序列的有条件概率。输出序列的元素与输入序列中的位置相对应的离散符号。这些问题不能通过现有的方法来解决, 如序列到序列和神经图导机等, 因为输出的每个步骤的目标类别数量取决于输入的长度, 这是变量的。诸如排序变量大小序列和各种组合优化问题属于这一类。我们的模型使用最近提议的神经关注机制解决了不同大小输出字典的问题。它不同于先前的注意尝试, 而不是在每次解码步骤中将编码器的隐藏单位混合到上下文矢量上, 因为每个步骤中的目标类别数量取决于输入序列的长度。我们称此结构为指针网( Ptr- Net) 。我们显示 Ptr- Net 模型可用于学习三个具有挑战性的问题的近似解决方案 -- 找到平面的螺旋圆柱体, 计算调调调调调调调调调调调时, 与先前的注意不同, 而不是在每次解调的顺序上使用经过训练的内置的内程来显示总的视野。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日