The advent of Large Language Models (LLMs) promised to resolve the long-standing paradox in honeypot design: achieving high-fidelity deception with low operational risk. However, despite a flurry of research since late 2022, progress has been incremental, and the field lacks a cohesive understanding of the emerging architectural patterns, core challenges, and evaluation paradigms. To fill this gap, this Systematization of Knowledge (SoK) paper provides the first comprehensive overview of this new domain. We survey and systematize three critical, intersecting research areas: first, we provide a taxonomy of honeypot detection vectors, structuring the core problems that LLM-based realism must solve; second, we synthesize the emerging literature on LLM-honeypots, identifying a canonical architecture and key evaluation trends; and third, we chart the evolutionary path of honeypot log analysis, from simple data reduction to automated intelligence generation. We synthesize these findings into a forward-looking research roadmap, arguing that the true potential of this technology lies in creating autonomous, self-improving deception systems to counter the emerging threat of intelligent, automated attackers.
翻译:大型语言模型(LLMs)的出现为解决蜜罐设计中长期存在的悖论带来了希望:即在低操作风险下实现高保真度的欺骗。然而,尽管自2022年底以来涌现了大量研究,进展却较为有限,且该领域对新兴架构模式、核心挑战及评估范式缺乏统一的理解。为填补这一空白,本文作为知识系统化(SoK)综述,首次全面概述了这一新兴领域。我们调研并系统梳理了三个关键且交叉的研究方向:首先,我们提出了蜜罐检测向量的分类法,系统阐述了基于LLM的真实性所需解决的核心问题;其次,我们综合了关于LLM蜜罐的新兴文献,归纳出一种典型架构及关键评估趋势;第三,我们梳理了蜜罐日志分析的发展路径,从简单的数据归约到自动化情报生成。我们将这些发现整合为一份前瞻性研究路线图,并指出该技术的真正潜力在于构建自主、自我优化的欺骗系统,以应对日益增长的智能化、自动化攻击威胁。