We introduce a new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence. Such problems cannot be trivially addressed by existent approaches such as sequence-to-sequence and Neural Turing Machines, because the number of target classes in each step of the output depends on the length of the input, which is variable. Problems such as sorting variable sized sequences, and various combinatorial optimization problems belong to this class. Our model solves the problem of variable size output dictionaries using a recently proposed mechanism of neural attention. It differs from the previous attention attempts in that, instead of using attention to blend hidden units of an encoder to a context vector at each decoder step, it uses attention as a pointer to select a member of the input sequence as the output. We call this architecture a Pointer Net (Ptr-Net). We show Ptr-Nets can be used to learn approximate solutions to three challenging geometric problems -- finding planar convex hulls, computing Delaunay triangulations, and the planar Travelling Salesman Problem -- using training examples alone. Ptr-Nets not only improve over sequence-to-sequence with input attention, but also allow us to generalize to variable size output dictionaries. We show that the learnt models generalize beyond the maximum lengths they were trained on. We hope our results on these tasks will encourage a broader exploration of neural learning for discrete problems.
翻译:我们引入一个新的神经结构以学习输出序列的有条件概率。 输出序列的元素与输入序列中的位置相对应的离散符号。 这些问题不能通过现有的方法来解决, 如序列到序列和神经图导机等, 因为输出的每个步骤的目标类别数量取决于输入的长度, 这是变量的。 诸如排序变量大小序列和各种组合优化问题属于这一类。 我们的模型使用最近提议的神经关注机制解决了不同大小输出字典的问题。 它不同于先前的注意尝试, 而不是在每次解码步骤中将编码器的隐藏单位混合到上下文矢量上, 因为每个步骤中的目标类别数量取决于输入序列的长度。 我们称此结构为指针网( Ptr- Net) 。 我们显示 Ptr- Net 模型可用于学习三个具有挑战性的问题的近似解决方案 -- 找到平面的螺旋圆柱体, 计算调调调调调调调调调调调时, 与先前的注意不同, 而不是在每次解调的顺序上使用经过训练的内置的内程来显示总的视野。