Modeling thematic fit (a verb--argument compositional semantics task) currently requires a very large burden of labeled data. We take a linguistically machine-annotated large corpus and replace corpus layers with output from higher-quality, more modern taggers. We compare the old and new corpus versions' impact on a verb--argument fit modeling task, using a high-performing neural approach. We discover that higher annotation quality dramatically reduces our data requirement while demonstrating better supervised predicate-argument classification. But in applying the model to psycholinguistic tasks outside the training objective, we see clear gains at scale, but only in one of two thematic fit estimation tasks, and no clear gains on the other. We also see that quality improves with training size, but perhaps plateauing or even declining in one task. Last, we tested the effect of role set size. All this suggests that the quality/quantity interplay is not all you need. We replicate previous studies while modifying certain role representation details and set a new state-of-the-art in event modeling, using a fraction of the data. We make the new corpus version public.
翻译:模拟主题( 动词- 参数构成语义学任务) 目前需要非常庞大的标签数据负担。 我们使用语言机器加注的大体积, 用质量更高、更现代的标签器输出来取代元素层。 我们用高性能的神经学方法比较旧版和新版对动词- 参数匹配模型任务的影响。 我们发现, 较高的批注质量大大降低了我们的数据要求, 同时展示了更好的受监督的上游参数分类 。 但是, 在将模型应用于培训目标以外的心理语言学任务时,我们看到了规模上的明显收益, 只是在两个主题匹配的估算任务中的一个取得了明显收益, 而另一个则没有明显收益 。 我们还看到, 质量随着培训规模的大小而提高, 但是也许在一个任务中处于平稳甚至下降 。 最后, 我们测试了角色设定大小的影响 。 所有这一切都表明, 质量/ 互动并非你所需要的全部 。 我们复制了先前的研究, 同时修改了某些角色描述的细节, 并设置新的事件模型, 使用数据的一部分 。