We present Bit Diffusion: a simple and generic approach for generating discrete data with continuous state and continuous time diffusion models. The main idea behind our approach is to first represent the discrete data as binary bits, and then train a continuous diffusion model to model these bits as real numbers which we call analog bits. To generate samples, the model first generates the analog bits, which are then thresholded to obtain the bits that represent the discrete variables. We further propose two simple techniques, namely Self-Conditioning and Asymmetric Time Intervals, which lead to a significant improvement in sample quality. Despite its simplicity, the proposed approach can achieve strong performance in both discrete image generation and image captioning tasks. For discrete image generation, we significantly improve previous state-of-the-art on both CIFAR-10 (which has 3K discrete 8-bit tokens) and ImageNet-64x64 (which has 12K discrete 8-bit tokens), outperforming the best autoregressive model in both sample quality (measured by FID) and efficiency. For image captioning on MS-COCO dataset, our approach achieves competitive results compared to autoregressive models.
翻译:我们提出Bit Dismission:一种简单和通用的方法,用连续状态和连续的时间扩散模型生成离散数据。我们的方法背后的主要想法是首先将离散数据作为二元比特,然后将连续扩散模型作为我们称之为模拟比特的真实数字进行模拟。要生成样本,模型首先生成模拟比特,然后进行阈值设定,以获得代表离散变量的比特。我们进一步提出两种简单的技术,即自调和对称时间比值,这导致样本质量的显著改善。尽管其简单化,但拟议的方法可以在离散图像生成和图像描述任务两方面取得很强的性能。对于离散图像生成,我们大大改进了CIFAR-10(有3K离散8位符号)和图像Net-64x64(有12K离散8位符号)的先前状态。我们进一步提出了两种简单的技术,即自闭式和对准时间间隔模式,这导致样本质量的显著改进。对于离散图像生成和图像描述模型而言,我们在MS-CO-CO数据模型上实现了竞争性的对比。</s>