Grammatical inference consists in learning a formal grammar (as a set of rewrite rules or a finite state machine). We are concerned with learning Nondeterministic Finite Automata (NFA) of a given size from samples of positive and negative words. NFA can naturally be modeled in SAT. The standard model [1] being enormous, we also try a model based on prefixes [2] which generates smaller instances. We also propose a new model based on suffixes and a hybrid model based on prefixes and suffixes. We then focus on optimizing the size of generated SAT instances issued from the hybrid models. We present two techniques to optimize this combination, one based on Iterated Local Search (ILS), the second one based on Genetic Algorithm (GA). Optimizing the combination significantly reduces the SAT instances and their solving time, but at the cost of longer generation time. We, therefore, study the balance between generation time and solving time thanks to some experimental comparisons, and we analyze our various model improvements.
翻译:语法推论包括学习正式语法(作为一套重写规则或限定状态机器)。我们关心的是从正词和负词样本中学习一定大小的非确定性非非非非自制自动数据(NFA),自然可以在SAT中建模。标准模型[1]规模巨大,我们还尝试基于前缀的模型[2] 产生较小实例。我们还提议基于后缀的新模型和基于前缀和后缀的混合模型。然后我们注重优化混合模型中生成的SAT案例的大小。我们提出优化这种组合的两种技术,一种基于异地搜索(ILS),第二种基于遗传阿尔戈里希姆(GA) 。优化组合会大大减少SAT案例及其解答时间,但成本是更长时间。因此,我们通过一些实验性比较研究生成时间和解析时间之间的平衡,我们分析我们的各种模型改进。