In the face of escalating surveillance and censorship within the cyberspace, the sanctity of personal privacy has come under siege, necessitating the development of steganography, which offers a way to securely hide messages within innocent-looking texts. Previous methods alternate the texts to hide private massages, which is not secure. Large Language Models (LLMs) provide high-quality and explicit distribution, which is an available mathematical tool for secure steganography methods. However, existing attempts fail to achieve high capacity, time efficiency and correctness simultaneously, and their strongly coupling designs leave little room for refining them to achieve better performance. To provide a secure, high-capacity and efficient steganography method, we introduce ShiMer. Specifically, ShiMer pseudorandomly shifts the probability interval of the LLM's distribution to obtain a private distribution, and samples a token according to the private bits. ShiMer produced steganographic texts are indistinguishable in quality from the normal texts directly generated by the language model. To further enhance the capacity of ShiMer, we design a reordering algorithm to minimize the occurrence of interval splitting during decoding phase. Experimental results indicate that our method achieves the highest capacity and efficiency among existing secure steganography techniques.
翻译:暂无翻译