We introduce a graph polynomial that distinguishes tree structures to represent dependency grammar and a measure based on the polynomial representation to quantify syntax similarity. The polynomial encodes accurate and comprehensive information about the dependency structure and dependency relations of words in a sentence. We apply the polynomial-based methods to analyze sentences in the Parallel Universal Dependencies treebanks. Specifically, we compare the syntax of sentences and their translations in different languages, and we perform a syntactic typology study of available languages in the Parallel Universal Dependencies treebanks. We also demonstrate and discuss the potential of the methods in measuring syntax diversity of corpora.
翻译:我们引入一个图形多面体, 区分树结构以代表依赖语法, 以及基于多面体表示法的措施, 以量化语法相似性; 多面体编码关于一个句子中的单词依赖结构和依赖关系的准确而全面的信息; 我们使用多面体法分析平行世界依赖树库的判决。 具体地说, 我们比较判决的语法及其不同语言的译文, 并对平行世界依赖树库中可用的语言进行综合类型学研究。 我们还展示和讨论测量共性多样性的方法的潜力。