This document aims to be a self-contained, mathematically precise overview of transformer architectures and algorithms (*not* results). It covers what transformers are, how they are trained, what they are used for, their key architectural components, and a preview of the most prominent models. The reader is assumed to be familiar with basic ML terminology and simpler neural network architectures such as MLPs.
翻译:本文件旨在对变压器结构和算法(*not* results)作一个自成一体、数学精确的概览(*not* results),包括变压器是什么、如何培训、如何使用、其关键的建筑构件、最突出模型的预览。读者被认为熟悉基本 ML 术语和简单的神经网络结构,如 MLP 。