Cyber deception is emerging as a promising approach to defending networks and systems against attackers and data thieves. However, despite being relatively cheap to deploy, the generation of realistic content at scale is very costly, due to the fact that rich, interactive deceptive technologies are largely hand-crafted. With recent improvements in Machine Learning, we now have the opportunity to bring scale and automation to the creation of realistic and enticing simulated content. In this work, we propose a framework to automate the generation of email and instant messaging-style group communications at scale. Such messaging platforms within organisations contain a lot of valuable information inside private communications and document attachments, making them an enticing target for an adversary. We address two key aspects of simulating this type of system: modelling when and with whom participants communicate, and generating topical, multi-party text to populate simulated conversation threads. We present the LogNormMix-Net Temporal Point Process as an approach to the first of these, building upon the intensity-free modeling approach of Shchur et al. to create a generative model for unicast and multi-cast communications. We demonstrate the use of fine-tuned, pre-trained language models to generate convincing multi-party conversation threads. A live email server is simulated by uniting our LogNormMix-Net TPP (to generate the communication timestamp, sender and recipients) with the language model, which generates the contents of the multi-party email threads. We evaluate the generated content with respect to a number of realism-based properties, that encourage a model to learn to generate content that will engage the attention of an adversary to achieve a deception outcome.
翻译:网络欺骗正在成为保护网络和系统对抗攻击者和数据盗贼的有希望的方法。然而,尽管相对廉价,但生成规模现实内容的成本却非常昂贵,因为丰富、互动的欺骗性技术基本上是手工制作的。随着机器学习的近期改进,我们现在有机会将规模和自动化引入创建现实和诱人模拟内容。在这项工作中,我们提出了一个框架,将电子邮件和即时信息式集体通信的生成规模自动化起来。组织内部的这种信息平台包含大量私人通信和文件附件内的宝贵信息,使它们成为对手的诱人目标。我们处理模拟这一类型系统的两个关键方面:模拟参与者何时与谁交流,并制作时与谁交流的多党文本,以模拟的模拟内容。我们介绍LogonNormMix-Net Temoral Point进程,以此为首个方法,在基于无强度模式模式的Shchur and al等模式的模型中,为独立和多党的通信内容创建了基因化模型。我们展示了实时的电子邮件内容的模型,我们用模拟服务器的模型来生成了一个模拟的Mreal-ridalalalalalalalalalal-stal-stal-motional real real report max-mode max max max 。