模拟位：使用带有自我调节的扩散模型生成离散数据

论文标题

模拟位：使用带有自我调节的扩散模型生成离散数据

Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning

论文作者

Chen, Ting, Zhang, Ruixiang, Hinton, Geoffrey

论文摘要

我们提出了位扩散：一种简单而通用的方法，用于生成具有连续状态和连续时间扩散模型的离散数据。我们方法背后的主要思想是首先将离散数据表示为二进制位，然后训练连续扩散模型，以将这些位模拟为实数，我们称为模拟位。要生成样品，该模型首先生成模拟位，然后将其阈值阈值以获得表示离散变量的位。我们进一步提出了两种简单的技术，即自我调节和不对称的时间间隔，从而显着改善了样本质量。尽管它简单，但提出的方法可以在离散图像生成和图像字幕任务中实现强大的性能。对于离散图像产生，我们在CIFAR-10（具有3K离散的8位令牌）和Imagenet-64x64（具有12K离散的8位代币）上显着改善了先前的最新技术，超过了样品质量（通过FID测量）和效率的最佳自动化模型。对于MS-Coco数据集上的图像字幕，与自回归模型相比，我们的方法可实现竞争成果。

We present Bit Diffusion: a simple and generic approach for generating discrete data with continuous state and continuous time diffusion models. The main idea behind our approach is to first represent the discrete data as binary bits, and then train a continuous diffusion model to model these bits as real numbers which we call analog bits. To generate samples, the model first generates the analog bits, which are then thresholded to obtain the bits that represent the discrete variables. We further propose two simple techniques, namely Self-Conditioning and Asymmetric Time Intervals, which lead to a significant improvement in sample quality. Despite its simplicity, the proposed approach can achieve strong performance in both discrete image generation and image captioning tasks. For discrete image generation, we significantly improve previous state-of-the-art on both CIFAR-10 (which has 3K discrete 8-bit tokens) and ImageNet-64x64 (which has 12K discrete 8-bit tokens), outperforming the best autoregressive model in both sample quality (measured by FID) and efficiency. For image captioning on MS-COCO dataset, our approach achieves competitive results compared to autoregressive models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题