This paper analyses the degree to which dialect classifiers based on syntactic representations remain stable over space and time. While previous work has shown that the combination of grammar induction and geospatial text classification produces robust dialect models, we do not know what influence both changing grammars and changing populations have on dialect models. This paper constructs a test set for 12 dialects of English that spans three years at monthly intervals with a fixed spatial distribution across 1,120 cities. Syntactic representations are formulated within the usage-based Construction Grammar paradigm (CxG). The decay rate of classification performance for each dialect over time allows us to identify regions undergoing syntactic change. And the distribution of classification accuracy within dialect regions allows us to identify the degree to which the grammar of a dialect is internally heterogeneous. The main contribution of this paper is to show that a rigorous evaluation of dialect classification models can be used to find both variation over space and change over time.
翻译:本文分析了基于综合表述法的方言分类方法在空间和时间上保持稳定的程度。先前的工作表明,语法上岗和地理空间文本分类相结合会产生强有力的方言模型,但我们不知道改变语法模型和变化的人口对方言模型有何影响。本文以固定空间分布的方式,为每三个月为12种英语方言制作一套测试,每期为期三年,分布在1 120个城市。在基于使用的基础建筑语法模型(CxG)中制定了协同表述法。每种方言的分类性能衰减率使我们能够识别正在发生合成变化的区域。在方言区域内分类准确度的分布使我们得以确定方言语语语的语法差异程度。本文的主要贡献是表明,对方言分类模型的严格评价可以用来发现空间的变化和时间的变化。