We present Bit Diffusion: a simple and generic approach for generating discrete data with continuous diffusion models. The main idea behind our approach is to first represent the discrete data as binary bits, and then train a continuous diffusion model to model these bits as real numbers which we call analog bits. To generate samples, the model first generates the analog bits, which are then thresholded to obtain the bits that represent the discrete variables. We further propose two simple techniques, namely Self-Conditioning and Asymmetric Time Intervals, which lead to a significant improvement in sample quality. Despite its simplicity, the proposed approach can achieve strong performance in both discrete image generation and image captioning tasks. For discrete image generation, we significantly improve previous state-of-the-art on both CIFAR-10 (which has 3K discrete 8-bit tokens) and ImageNet-64x64 (which has 12K discrete 8-bit tokens), outperforming the best autoregressive model in both sample quality (measured by FID) and efficiency. For image captioning on MS-COCO dataset, our approach achieves competitive results compared to autoregressive models.
翻译:我们提出Bit Dismission:一种简单通用的方法,用连续扩散模型生成离散数据。我们方法的主要理念是首先将离散数据作为二元比特,然后将连续扩散模型作为我们称之为模拟比特的真实数字进行模拟。要生成样本,模型首先生成模拟比特,然后将其设定门槛,以获得代表离散变量的比特。我们进一步提出两种简单的技术,即自定义和对称时间比特,这导致样本质量的显著改善。尽管其简单化,但拟议的方法可以在离散图像生成和图像字幕描述任务两方面取得很强的性能。对于离散图像生成,我们大大改进了CIFAR-10(有3K离散8位符号)和图像Net-64x64(有12K离散8位符号)的先前状态,从而优于样本质量(由FID测量)和效率方面的最佳自动递增模式。关于离散图像描述MS-CO数据生成和图像描述,我们的方法与自动对比具有竞争性的结果。