We show that Transformers are Maximum Posterior Probability estimators for Mixtures of Gaussian Models. This brings a probabilistic point of view to Transformers and suggests extensions to other probabilistic cases.
翻译:我们显示,变换器是高山模型混合体的最大外在概率估计器。 这给变换器带来一个概率观点, 并提示其他概率案例的扩展 。