The transduction of sequence has been mostly done by recurrent networks, which are computationally demanding and often underestimate uncertainty severely. We propose a computationally efficient attention-based network combined with the Gaussian process regression to generate real-valued sequence, which we call the Attentive-GP. The proposed model not only improves the training efficiency by dispensing recurrence and convolutions but also learns the factorized generative distribution with Bayesian representation. However, the presence of the GP precludes the commonly used mini-batch approach to the training of the attention network. Therefore, we develop a block-wise training algorithm to allow mini-batch training of the network while the GP is trained using full-batch, resulting in a scalable training method. The algorithm has been proved to converge and shows comparable, if not better, quality of the found solution. As the algorithm does not assume any specific network architecture, it can be used with a wide range of hybrid models such as neural networks with kernel machine layers in the scarcity of resources for computation and memory.
翻译:序列的转换大多由经常性网络完成,这些网络在计算上要求很高,而且往往严重低估不确定性。我们提议一个基于计算效率的、以关注为基础的网络,加上高山进程回归,以产生实际价值的序列,我们称之为“加速-GP ” 。拟议模型不仅通过避免重复和变迁来提高培训效率,而且还学习与巴伊西亚代表处的因子化基因分布。然而,GP的存在排除了对关注网络的培训通常使用的小批量方法。因此,我们开发了一个组合式培训算法,允许对网络进行微型批量培训,同时对GP进行全批培训,从而形成一种可扩展的培训方法。这一算法已证明了匹配,并且显示,如果不是更好的话,所发现的解决办法的质量是可比的。由于算法不包含任何特定的网络结构,因此它可以与广泛的混合模型一起使用,例如缺乏计算和记忆资源的内核机层神经网络。