About nanoGPT

nanoGPT is a project hosted on GitHub, described as “The simplest, fastest repository for training/finetuning medium-sized GPTs.” It is a reiteration of minGPT that emphasizes efficiency over education. The codebase is designed to be simple, making it easy to modify according to specific needs. The primary files include a training loop (train.py) and a GPT model definition (model.py), both of which are approximately 300 lines each. The repository allows users to train new models from scratch or fine-tune pre-trained checkpoints, such as the GPT-2 1.3B model from OpenAI.

Features of nanoGPT

  1. Simplicity and Speed: nanoGPT is designed to be the simplest and fastest repository for training and fine-tuning medium-sized GPT models.
  2. Readable Code: The main code files, train.py and model.py, are both around 300 lines, making them easy to understand and modify.
  3. Training Flexibility: Users can train new models from scratch or fine-tune pre-existing checkpoints.
  4. Compatibility with GPT-2: The repository can optionally load GPT-2 weights from OpenAI.
  5. Dependencies: The project requires several dependencies including PyTorch, numpy, transformers, datasets, tiktoken, wandb, and tqdm.
  6. Quick Start Guide: For beginners, the repository offers a quick start guide to train a character-level GPT on Shakespeare’s works.
  7. Reproducing GPT-2: Advanced users can reproduce GPT-2 results using the provided instructions.
  8. Fine-tuning: The repository provides guidance on how to fine-tune a GPT model on new text datasets.
  9. Sampling/Inference: Users can sample from pre-trained GPT-2 models or models they trained themselves using the sample.py script.
  10. Efficiency Notes: The code uses PyTorch 2.0 by default, which offers noticeable performance improvements.

Additional Features

  1. Baselines: The repository provides baselines for OpenWebText using OpenAI GPT-2 checkpoints.
  2. Efficiency Notes: There are notes on benchmarking and profiling the model for efficiency, highlighting the benefits of using PyTorch 2.0.
  3. To-Do List: The repository lists several tasks and improvements that are yet to be implemented, such as adding FSDP instead of DDP, evaluating zero-shot perplexities on standard evaluations, and incorporating other embeddings.
  4. Troubleshooting: Guidance is provided for users who encounter issues related to PyTorch 2.0 or other aspects of the repository.
  5. Acknowledgements: The repository acknowledges the support of Lambda labs for powering the nanoGPT experiments.