Despite the theoretical significance and wide practical use of regular expressions, the computational complexity of learning them has been largely unexplored. We study the computational hardness of improperly learning regular expressions in the PAC model and with membership queries. We show that PAC learning is hard even under the uniform distribution on the hypercube, and also prove hardness of distribution-free learning with membership queries. Furthermore, if regular expressions are extended with complement or intersection, we establish hardness of learning with membership queries even under the uniform distribution. We emphasize that these results do not follow from existing hardness results for learning DFAs or NFAs, since the descriptive complexity of regular languages can differ exponentially between DFAs, NFAs, and regular expressions.
翻译:尽管正则表达式具有重要的理论意义和广泛的实践应用,但其学习过程的计算复杂性在很大程度上尚未得到充分探索。本研究在PAC模型及成员查询框架下,系统探究了非精确学习正则表达式的计算困难性。我们证明:即使在超立方体上的均匀分布条件下,PAC学习正则表达式仍然是困难的;同时,我们也证明了在分布无关条件下结合成员查询的学习任务同样具有计算困难性。进一步地,当正则表达式扩展至包含补集或交集运算时,我们证明了即使在均匀分布条件下,结合成员查询的学习任务仍然具有计算困难性。需要特别指出的是,这些结论并不能从现有关于学习确定性有限自动机(DFA)或非确定性有限自动机(NFA)的困难性结果中直接推导得出,因为正则语言的描述复杂度在DFA、NFA与正则表达式之间可能存在指数级的差异。