The regular spanners (characterised by vset-automata) are closed under the algebraic operations of union, join and projection, and have desirable algorithmic properties. The core spanners (introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015) as a formalisation of the core functionality of the query language AQL used in IBM's SystemT) additionally need string equality selections and it has been shown by Freydenberger and Holldack (ICDT 2016, Theory of Computing Systems 2018) that this leads to high complexity and even undecidability of the typical problems in static analysis and query evaluation. We propose an alternative approach to core spanners: by incorporating the string-equality selections directly into the regular language that represents the underlying regular spanner (instead of treating it as an algebraic operation on the table extracted by the regular spanner), we obtain a fragment of core spanners that, while having slightly weaker expressive power than the full class of core spanners, arguably still covers the intuitive applications of string equality selections for information extraction and has much better upper complexity bounds of the typical problems in static analysis and query evaluation.
翻译:常规光标( 由 vset- automata 定性为 vset- automata ) 在联盟的代数操作下关闭, 加入和投影, 并具有理想的算法属性。 核心光标( 由 Fagin、 Kimelfeld、 Reiss 和 Vansummeren ( PODS 2013、 JACM 2015) 引入), 将IBM的系统T 中使用的查询语言 AQL 的核心功能正规化( 而不是将它作为普通光标盘上测的测距操作处理) 。 Freydenberger 和 Holldack ( ICDT 2016, Economic System System 2018) 显示, 这导致静态分析和查询评价中典型问题的高度复杂性, 甚至仍然在常规光谱分析中, 将弦平等值选择的精度应用纳入常规的常规语言中( ), 而不是将它视为由常规光谱显示的测操作), 我们得到了一个核心光谱阵列的碎片阵列的碎片阵列的碎片, 虽然比整个阵列的显示能力稍弱弱, 。