The identification of deterministic finite automata (DFAs) from labeled examples is a cornerstone of automata learning, yet traditional methods focus on learning monolithic DFAs, which often yield a large DFA lacking simplicity and interoperability. Recent work addresses these limitations by exploring DFA decomposition identification problems (DFA-DIPs), which model system behavior as intersections of multiple DFAs, offering modularity for complex tasks. However, existing DFA-DIP approaches depend on SAT encodings derived from Augmented Prefix Tree Acceptors (APTAs), incurring scalability limitations due to their inherent redundancy. In this work, we advance DFA-DIP research through studying two variants: the traditional Pareto-optimal DIP and the novel states-optimal DIP, which prioritizes a minimal number of states. We propose a novel framework that bridges DFA decomposition with recent advancements in automata representation. One of our key innovations replaces APTA with 3-valued DFA (3DFA) derived directly from labeled examples. This compact representation eliminates redundancies of APTA, thus drastically reducing variables in the improved SAT encoding. Experimental results demonstrate that our 3DFA-based approach achieves significant efficiency gains for the Pareto-optimal DIP while enabling a scalable solution for the states-optimal DIP.
翻译:从标注示例中辨识确定性有限自动机是自动机学习的基石,然而传统方法侧重于学习单一的DFA,这通常会产生规模庞大且缺乏简洁性与互操作性的DFA。近期研究通过探索DFA分解辨识问题来应对这些局限,该方法将系统行为建模为多个DFA的交集,为复杂任务提供了模块化方案。然而,现有的DFA-DIP方法依赖于从增强前缀树接受器导出的SAT编码,其固有的冗余性导致了可扩展性限制。本研究通过分析两个变体推进DFA-DIP研究:传统的帕累托最优DIP与新颖的状态最优DIP(后者以最小化状态数为优先目标)。我们提出了一个创新框架,将DFA分解与自动机表示的最新进展相结合。核心创新之一是用直接从标注示例导出的三值DFA替代APTAs。这种紧凑表示消除了APTAs的冗余,从而显著改进SAT编码中的变量规模。实验结果表明,基于3DFA的方法在帕累托最优DIP中实现了显著的效率提升,同时为状态最优DIP提供了可扩展的解决方案。