In February this year Google proposed a new Transformer variant called FLASH, which has a faster speed, lower VRAM footprint and better performance. This is achieved by designing a performant layer named GAU (Gated Attention Unit), which combines the Attention layer and FFN. In this paper, some implementation details are re-analyzed both theoretically and practically. We then propose a novel GAU-based model and pre-train it on a Chinese corpus. Results of the CLUE benchmark show that our model achieves a dev average score of 75.02, 1% higher than RoFormerV1 and being 45% faster, which is also competitive with RoFormerV2.
翻译:今年2月,谷歌提出了名为FLASH的新的变异变体,其速度更快,VRAM足迹更低,性能更佳。这是通过设计一个名为GAU(Gated attention Unit)的表演层(GAU ) (GAU ) ( GAU) ( GAUI) (GAU) ( GAUT ) ( GAUI) ( GAUI) ( GAUT ) ( GUI) ( GUI) ( GUI) ( GUED TH) ( GUI) ) ( GUED ) ( GUI) ( GUED TH) ( GUED TH) ( GUI) ( GUED TUE) ( GUE) ( GUE) ( GUE) ( FF) ( FFFFN) ) ( FFFNF) ( FU) ( FOLASH) ( FOLASH) (F) (F) (FI) (FLASH) (FOU) (F) (FOUE) (FOU) (FLAU) (FOU) (FOU) (FOU) (F) (F) (FOU) (F) (F) (F) (F) (F) (F) (F) (F) (F) (F) (F) (F) (FOU) (FOU) (FOU) ) ) (FOU) (FO) (F) ) (FOU) (FO) (FOU) (FOU) (F) (F) (F) ) (F) ) ) ) (F) ) (FO) ) ) ) (FOU) (FOU) (FOU) (FO) (FO) (FO) ) ) ) (F) (FOUEU) (FOUEU) (FOU) (FOU) 。 ) ) ) (FOU) (FOUAU) ) )