长觉正则表达式 SMT 解答器 (A Length-aware Regular Expression SMT Solver)

Motivated by program analysis, security, and verification applications, we study various fragments of a rich first-order quantifier-free (QF) theory $T_{LRE,n,c}$ over regular expression (regex) membership predicate, linear integer arithmetic over string length, string-number conversion predicate, and string concatenation. Our contributions are the following. On the theoretical side, we prove a series of (un)decidability and complexity theorems for various fragments of $T_{LRE,n,c}$, some of which have been open for several years. On the practical side, we present a novel length-aware decision procedure for the QF first-order theory $T_{LRE}$ with regex membership predicate and linear arithmetic over string length. The crucial insight that enables our algorithm to scale for instances obtained from practical applications is that these instances contain a wealth of information about upper and lower bounds on lengths of strings which can be used to simplify operations on automata representing regexes. We showcase the power of our algorithm via an extensive empirical evaluation over a large and diverse benchmark of over 57000 regex-heavy instances, derived from a mix of industrial applications, instances contributed by other solver developers, as well as randomly-generated ones. Specifically, our solver outperforms five other state-of-the-art string solvers, namely, CVC4, Z3str3, Z3-Trau, OSTRICH and Z3seq, over this benchmark.

翻译：在程序分析、安全和核查应用的动力下,我们研究了一个丰富的第一阶无量化标准(QF)理论($T ⁇ LRE,n,c})在正常表达式(regex)的前提上、字符串长度的线性整数计算、字符串编号转换上游和字符串调等各种应用方面的各种碎片。我们的贡献如下。在理论方面,我们证明一系列(不)衰减和复杂性的关于美元($T ⁇ LRE,n,c}美元)各种碎片,其中一些已经开放了几年。在实际方面,我们为QF第一阶理论($T ⁇ LRE,n,c})提出了一个新颖的长觉悟决定程序($$T ⁇ LRE,c}($Right)在正常表达式表达式表达式表达式表达式表达($T>3),在字符串成员时间长度上,线数线性计算(线性算法)和线性计算。让我们能够根据实际应用来衡量实例的算法,这些事例包含大量关于内下线段长度的信息,可以用来简化代表 regexexex的操作的动作的操作。我们通过对Atomatmat-rocreal-destreal strual strual 3, stral stral stral stral 3 ex ex ex ex ex ex ex a ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex, ex ex ex ex ex ex exbal exbal ex ex ex ex ex exbal ex ex exbal exbactalviewalalalalalalal ex ex ex ex aviewalal ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex