Diffusion models have demonstrated remarkable performance in generating high-dimensional samples across domains such as vision, language, and the sciences. Although continuous-state diffusion models have been extensively studied both empirically and theoretically, discrete-state diffusion models, essential for applications involving text, sequences, and combinatorial structures, they remain significantly less understood from a theoretical standpoint. In particular, all existing analyses of discrete-state models assume access to an empirical risk minimizer. In this work, we present a principled theoretical framework analyzing diffusion models, providing a state-of-the-art sample complexity bound of $\widetilde{\mathcal{O}}(\epsilon^{-4})$. Our structured decomposition of the score estimation error into statistical and optimization components offers critical insights into how diffusion models can be trained efficiently. This analysis addresses a fundamental gap in the literature and establishes the theoretical tractability and practical relevance of diffusion models.
翻译:暂无翻译