SQL is the world's most popular declarative language, forming the basis of the multi-billion-dollar database industry. Although SQL has been standardized, the full standard is based on ambiguous natural language rather than formal specification. Commercial SQL implementations interpret the standard in different ways, so that, given the same input data, the same query can yield different results depending on the SQL system it is run on. Even for a particular system, mechanically checked formalization of all widely-used features of SQL remains an open problem. The lack of a well-understood formal semantics makes it very difficult to validate the soundness of database implementations. Although formal semantics for fragments of SQL were designed in the past, they usually did not support set and bag operations, lateral joins, nested subqueries, and, crucially, null values. Null values complicate SQL's semantics in profound ways analogous to null pointers or side-effects in other programming languages. Since certain SQL queries are equivalent in the absence of null values, but produce different results when applied to tables containing incomplete data, semantics which ignore null values are able to prove query equivalences that are unsound in realistic databases. A formal semantics of SQL supporting all the aforementioned features was only proposed recently. In this paper, we report about our mechanization of SQL semantics covering set/bag operations, lateral joins, nested subqueries, and nulls, written in the Coq proof assistant, and describe the validation of key metatheoretic properties. Additionally, we are able to use the same framework to formalize the semantics of a flat relational calculus (with null values), and show a certified translation of its normal forms into SQL.
翻译:SQL 是全世界最受欢迎的宣言语言, 构成了数十亿美元数据库行业的基础。 虽然 SQL 已经实现了标准化, 完整的标准以模糊的自然语言而不是正式的规格为基础。 商业 SQL 执行以不同的方式解释标准, 因此, 根据相同的输入数据, 同样的查询可以产生不同的结果, 取决于它正在运行的 SQL 系统。 即使对于某个特定系统来说, 机械地检查了SQL 所有广泛使用的特性的正规化, 仍然是一个尚未解决的问题。 由于缺乏精密的正式语义, 很难验证数据库执行的正确性能。 尽管 SQL 碎片的正规语义是以模糊的自然语言设计的, 但对于SQL 的碎片, 它们通常不支持设置和包的操作, 因此, 后端连接、 嵌入的子项, 以及 无效值。 Null 数值使 SQL 的语义配置与其它编程语言中的无效指针或副作用相近似。 由于某些 SQL 查询等同于没有无效的数值, 但是当应用了正常的运行结果时, 当应用了正常的运行时, 。