Softmax is the de facto standard in modern neural networks for language processing when it comes to normalizing logits. However, by producing a dense probability distribution each token in the vocabulary has a nonzero chance of being selected at each generation step, leading to a variety of reported problems in text generation. $\alpha$-entmax of Peters et al. (2019, arXiv:1905.05702) solves this problem, but is considerably slower than softmax. In this paper, we propose an alternative to $\alpha$-entmax, which keeps its virtuous characteristics, but is as fast as optimized softmax and achieves on par or better performance in machine translation task.
翻译:软件是现代语言处理神经网络在对登录进行正常化时的实际标准。 但是,通过生成密集概率分布,词汇中每个符号在每代步骤中都有非零概率被选中,导致在文本生成方面出现各种报告的问题。 Peters et al. (2019, arxiv:1905.05702) 美元- entmax 解决了这个问题,但比软体要慢得多。 在本文中,我们建议了美元/alpha$- entmax的替代方法,以保持其良性特征,但速度和优化软体一样快,在机器翻译任务中取得相同或更好的表现。