We propose Dual PatchNorm: two Layer Normalization layers (LayerNorms), before and after the patch embedding layer in Vision Transformers. We demonstrate that Dual PatchNorm outperforms the result of exhaustive search for alternative LayerNorm placement strategies in the Transformer block itself. In our experiments, incorporating this trivial modification, often leads to improved accuracy over well-tuned Vision Transformers and never hurts.
翻译:我们建议双轨:两层正常化层(LayerNorms ) : 两层正常化层(LayerNorms ), 之前和之后, 在愿景变异器中嵌入的补丁层中。 我们证明双轨化层超过彻底寻找变异器块本身的替代图层硬化策略的结果。 在我们的实验中,纳入这一微小的修改,常常导致对调整良好的愿景变异器的准确性提高,而不会伤害。