Linguistic steganography embeds secret information in seemingly innocent texts, safeguarding privacy in surveillance environments. Generative linguistic steganography leverages the probability distribution of language models (LMs) and applies steganographic algorithms to generate stego tokens, gaining attention with recent Large Language Model (LLM) advancements. To enhance security, researchers develop distribution-preserving stego algorithms to minimize the gap between stego sampling and LM sampling. However, the reliance on language model distributions, coupled with deviations from real-world cover texts, results in insufficient imperceptibility when facing steganalysis detectors in real-world scenarios. Moreover, LLM distributions tend to be more deterministic, resulting in reduced entropy and, consequently, lower embedding capacity. In this paper, we propose FreStega, a plug-and-play method to reconstruct the distribution of language models used for generative linguistic steganography. FreStega dynamically adjusts token probabilities from the language model at each step of stegotext auto-regressive generation, leveraging both sequential and spatial dimensions. In sequential adjustment, the temperature is dynamically adjusted based on instantaneous entropy, enhancing the diversity of stego texts and boosting embedding capacity. In the spatial dimension, the distribution is aligned with guidance from the target domain corpus, closely mimicking real cover text in the target domain. By reforming the distribution, FreStega enhances the imperceptibility of stego text in practical scenarios and improves steganographic capacity by 15.41\%, all without compromising the quality of the generated text. FreStega serves as a plug-and-play remedy to enhance the imperceptibility and embedding capacity of existing distribution-preserving steganography methods in real-world scenarios.
翻译:暂无翻译