Bidirectional masked Transformers have become the core theme in the current NLP landscape. Despite their impressive benchmarks, a recurring theme in recent research has been to question such models' capacity for syntactic generalization. In this work, we seek to address this question by adding a supervised, token-level supertagging objective to standard unsupervised pretraining, enabling the explicit incorporation of syntactic biases into the network's training dynamics. Our approach is straightforward to implement, induces a marginal computational overhead and is general enough to adapt to a variety of settings. We apply our methodology on Lassy Large, an automatically annotated corpus of written Dutch. Our experiments suggest that our syntax-aware model performs on par with established baselines, despite Lassy Large being one order of magnitude smaller than commonly used corpora.
翻译:双向蒙面变形器已成为当前全国劳工政策中的核心主题。 尽管其基准令人印象深刻,但最近研究中反复出现的一个主题是质疑这类模型的合成概括能力。 在这项工作中,我们力求解决这一问题,在标准不受监督的预培训中增加一个受监督的、象征性的超高标记目标,从而能够将综合偏差明确纳入网络的培训动态。我们的方法直截了当地实施,引出一个边际的计算间接费用,并且非常笼统,足以适应各种环境。我们在Lassy Ung上应用了我们的方法,这是一套自动附加注释的荷兰书面材料。我们的实验表明,尽管Lassy Lassy Awart是比常用的共体规模小的一等量级,但我们的通税-通识模型在与既定基线相同的条件下运行。