nanoGPT is a project hosted on GitHub, described as “The simplest, fastest repository for training/finetuning medium-sized GPTs.” It is a reiteration of minGPT that emphasizes efficiency over education. The codebase is designed to be simple, making it easy to modify according to specific needs. The primary files include a training loop (
train.py) and a GPT model definition (
model.py), both of which are approximately 300 lines each. The repository allows users to train new models from scratch or fine-tune pre-trained checkpoints, such as the GPT-2 1.3B model from OpenAI.
Features of nanoGPT
- Simplicity and Speed: nanoGPT is designed to be the simplest and fastest repository for training and fine-tuning medium-sized GPT models.
- Readable Code: The main code files,
model.py, are both around 300 lines, making them easy to understand and modify.
- Training Flexibility: Users can train new models from scratch or fine-tune pre-existing checkpoints.
- Compatibility with GPT-2: The repository can optionally load GPT-2 weights from OpenAI.
- Dependencies: The project requires several dependencies including PyTorch, numpy, transformers, datasets, tiktoken, wandb, and tqdm.
- Quick Start Guide: For beginners, the repository offers a quick start guide to train a character-level GPT on Shakespeare’s works.
- Reproducing GPT-2: Advanced users can reproduce GPT-2 results using the provided instructions.
- Fine-tuning: The repository provides guidance on how to fine-tune a GPT model on new text datasets.
- Sampling/Inference: Users can sample from pre-trained GPT-2 models or models they trained themselves using the
- Efficiency Notes: The code uses PyTorch 2.0 by default, which offers noticeable performance improvements.
- Baselines: The repository provides baselines for OpenWebText using OpenAI GPT-2 checkpoints.
- Efficiency Notes: There are notes on benchmarking and profiling the model for efficiency, highlighting the benefits of using PyTorch 2.0.
- To-Do List: The repository lists several tasks and improvements that are yet to be implemented, such as adding FSDP instead of DDP, evaluating zero-shot perplexities on standard evaluations, and incorporating other embeddings.
- Troubleshooting: Guidance is provided for users who encounter issues related to PyTorch 2.0 or other aspects of the repository.
- Acknowledgements: The repository acknowledges the support of Lambda labs for powering the nanoGPT experiments.