In this paper we introduce the SchemaDB data-set; a collection of relational database schemata in both sql and graph formats. Databases are not commonly shared publicly for reasons of privacy and security, so schemata are not available for study. Consequently, an understanding of database structures in the wild is lacking, and most examples found publicly belong to common development frameworks or are derived from textbooks or engine benchmark designs. SchemaDB contains 2,500 samples of relational schemata found in public repositories which we have standardised to MySQL syntax. We provide our gathering and transformation methodology, summary statistics, and structural analysis, and discuss potential downstream research tasks in several domains.
翻译:在本文中,我们介绍SchemaDB数据集;以sql格式和图表格式收集的关系数据库模型;由于隐私和安全的原因,数据库不普遍公开共享,因此无法进行系统模型研究;因此,对野生数据库结构缺乏了解,大多数公开发现的例子属于共同发展框架,或来自教科书或引擎基准设计;SchemaDB载有公共储存库中发现的关系模型模型样本2 500个样本,我们已经将其标准化为 MySQL 语法。我们提供了我们的收集和转换方法、简要统计和结构分析,并讨论了若干领域的潜在下游研究任务。