Steganography comprises the mechanics of hiding data in a host media that may be publicly available. While previous works focused on unimodal setups (e.g., hiding images in images, or hiding audio in audio), PixInWav targets the multimodal case of hiding images in audio. To this end, we propose a novel residual architecture operating on top of short-time discrete cosine transform (STDCT) audio spectrograms. Among our results, we find that the residual audio steganography setup we propose allows independent encoding of the hidden image from the host audio without compromising quality. Accordingly, while previous works require both host and hidden signals to hide a signal, PixInWav can encode images offline -- which can be later hidden, in a residual fashion, into any audio signal. Finally, we test our scheme in a lab setting to transmit images over airwaves from a loudspeaker to a microphone verifying our theoretical insights and obtaining promising results.
翻译:视觉学包括将数据隐藏在可能公开的东道媒体中的机制。 虽然先前的工作侧重于单式设置( 将图像隐藏在图像中, 或将音频隐藏在音频中), PixInWav 的目标是将图像隐藏在音频中的多式案例。 为此,我们提议在短时间离散连线变异音频光谱仪之外再建立一个新型的剩余结构。 我们发现, 我们提议的剩余音频谱学设置允许将隐藏图像独立编码于主机音频中, 而不减损质量。 因此, 虽然以前的工程需要主机和隐藏信号来隐藏信号, PixInWav 可以将离线图像编码为离线( 稍后可以以剩余方式将其隐藏在音频信号中) 。 最后, 我们在实验室设置中测试了我们的计划, 将图像从扩音器上传送到麦克风上, 以证实我们的理论洞察力并获得有希望的结果 。