A latent text-to-image diffusion model capable of generating photo-realistic images given any text input