Token free approaches have been successfully applied to a series of word and span level tasks. In this work, we compare a byte-level (ByT5) and a wordpiece based (mT5) sequence to sequence model on the 51 languages of the MASSIVE multilingual semantic parsing dataset. We examine multiple experimental settings: (i) zero-shot, (ii) full gold data and (iii) zero-shot with synthetic data. By leveraging a state-of-the-art label projection method for machine translated examples, we are able to reduce the gap in exact match accuracy to only 5 points with respect to a model trained on gold data from all the languages. We additionally provide insights on the cross-lingual transfer of ByT5 and show how the model compares with respect to mT5 across all parameter sizes.
翻译:在这项工作中,我们比较了一个字节级(ByT5)和一个基于字件(mT5)的顺序序列,以模型的形式排列了MSAive多语种语义解解析数据集的51种语言。我们研究了多种实验设置:(一) 零射,(二) 完整的黄金数据,(三) 合成数据的零射。通过利用最先进的标签投影法来计算机器翻译的示例,我们能够将精确匹配率的差距缩小到仅5个点,以所有语言的金数据模型为对象。我们还提供了关于BYT5跨语言传输的洞见,并展示了该模型在所有参数大小上相对于 mT5 的对比。