About Image GPT

Image GPT is a project by OpenAI that explores the capabilities of large transformer models in generating coherent image completions and samples. The project is based on the premise that a large transformer model trained on pixel sequences can generate coherent images, similar to how a transformer model trained on language can generate coherent text. The model’s performance is evaluated by establishing a correlation between sample quality and image classification accuracy.


  1. Generative Capabilities: Image GPT is trained on pixel sequences, enabling it to generate coherent image completions and samples. This is a significant advancement in unsupervised and self-supervised learning, where the model learns without human-labeled data.
  2. Competitive Features: The best generative model of Image GPT contains features that are competitive with top convolutional nets in the unsupervised setting. This means that the model can generate high-quality images that are comparable to those produced by top-performing models in the field.
  3. Domain Agnostic: Transformer models like Image GPT are domain agnostic, meaning they can be applied to 1-D sequences of any form. This allows the model to understand 2-D image characteristics such as object appearance and category, even without the guidance of human-provided labels.