We explore a new class of diffusion models based on the transformer architecture. We train latent diffusion models of images, replacing the commonly-used U-Net backbone with a transformer that ...
Abstract: Transformer, an attention-based encoder–decoder model, has already revolutionized the field of natural language processing (NLP). Inspired by such significant achievements, some pioneering ...