Cyber deception is emerging as a promising approach to defending networks and systems against attackers and data thieves. However, despite being relatively cheap to deploy, the generation of realistic content at scale is very costly, due to the fact that rich, interactive deceptive technologies are largely hand-crafted. With recent improvements in Machine Learning, we now have the opportunity to bring scale and automation to the creation of realistic and enticing simulated content. In this work, we propose a framework to automate the generation of email and instant messaging-style group communications at scale. Such messaging platforms within organisations contain a lot of valuable information inside private communications and document attachments, making them an enticing target for an adversary. We address two key aspects of simulating this type of system: modelling when and with whom participants communicate, and generating topical, multi-party text to populate simulated conversation threads. We present the LogNormMix-Net Temporal Point Process as an approach to the first of these, building upon the intensity-free modeling approach of Shchur et al.~\cite{shchur2019intensity} to create a generative model for unicast and multi-cast communications. We demonstrate the use of fine-tuned, pre-trained language models to generate convincing multi-party conversation threads. A live email server is simulated by uniting our LogNormMix-Net TPP (to generate the communication timestamp, sender and recipients) with the language model, which generates the contents of the multi-party email threads. We evaluate the generated content with respect to a number of realism-based properties, that encourage a model to learn to generate content that will engage the attention of an adversary to achieve a deception outcome.
翻译:网络欺骗正在成为保护网络和系统不受攻击者和数据盗贼攻击的一个充满希望的方法。然而,尽管相对廉价,但制作规模现实内容的成本却非常昂贵,因为丰富、互动的欺骗性技术基本上是手工制作的。随着机器学习的最近改进,我们现在有机会将规模和自动化引入创建现实和诱人模拟内容。在这项工作中,我们提出了一个框架,将电子邮件和即时信息式集体通信的生成规模自动化。组织内部的这种信息平台包含大量私人通信和文件附件内的宝贵信息,使其成为一个吸引对手的电子邮件目标。我们处理模拟这一类型系统的两个关键方面:模拟参与者何时和与谁交流,并制作时与谁交流的多方文本,以模拟模拟的模拟内容。我们介绍LogNormMix-Net Temoral Point进程,以此为首个方法,在Shrchchur和al-deal Exmal Exmessional Expressional Expressional-motionalalal motional-moal motional-modal-motional motional-motional-motional motional-motional-motional-motional mode motional-motions 将生成一个模拟到一个模拟的模型,我们模拟的模拟到一个模拟到一个模拟到一个模拟到一个模拟的版本,我们模拟到一个模拟的服务器的模拟的模拟到一个模拟到多式的版本,我们模拟的版本。