Self-supervised learning (SSL) models confront challenges of abrupt informational collapse or slow dimensional collapse. We propose TriNet, which introduces a novel triple-branch architecture for preventing collapse and stabilizing the pre-training. TriNet learns the SSL latent embedding space and incorporates it to a higher level space for predicting pseudo target vectors generated by a frozen teacher. Our experimental results show that the proposed method notably stabilizes and accelerates pre-training and achieves a relative word error rate reduction (WERR) of 6.06% compared to the state-of-the-art (SOTA) Data2vec for a downstream benchmark ASR task. We will release our code at https://github.com/tencent-ailab/.
翻译:自我监督的学习模式面临信息突然崩溃或缓慢的维度崩溃的挑战。 我们提议TriNet, 引入一个新的三权结构来防止崩溃和稳定培训前的工作。 TriNet 学习了 SSL 潜在的嵌入空间, 并将其纳入一个更高的空间, 用于预测冷冻教师生成的假目标矢量。 我们的实验结果表明, 拟议的方法显著稳定并加快了培训前的进度, 并实现了相对字词误差率下降6.06%(WERR), 与最新技术(SOTA) Data2vec 相比, 用于下游基准 ASR 任务。 我们将在 https://github.com/tencent-ilab/上发布我们的代码 。</s>