Recent years have witnessed a steep increase in linguistic databases capturing syntactic variation. We survey and describe 21 publicly available morpho-syntactic databases, focusing on such properties as data structure, user interface, documentation, formats, and overall user friendliness. We demonstrate that all the surveyed databases can be fruitfully categorized along two dimensions: units of description and the design principle. Units of description refer to the type of the data the database represents (languages, constructions, or expressions). The design principles capture the internal logic of the database. We identify three primary design principles, which vary in their descriptive power, granularity, and complexity: monocategorization, multicategorization, and structural decomposition. We describe how these design principles are implemented in concrete databases and discuss their advantages and limitations. Finally, we outline essential desiderata for future modern databases in linguistics.
翻译:暂无翻译