nanoGPT

About nanoGPT

nanoGPT is a project hosted on GitHub, described as “The simplest, fastest repository for training/finetuning medium-sized GPTs.” It is a reiteration of minGPT that emphasizes efficiency over education. The codebase is designed to be simple, making it easy to modify according to specific needs. The primary files include a training loop (train.py) and a GPT model definition (model.py), both of which are approximately 300 lines each. The repository allows users to train new models from scratch or fine-tune pre-trained checkpoints, such as the GPT-2 1.3B model from OpenAI.

Features of nanoGPT

Simplicity and Speed: nanoGPT is designed to be the simplest and fastest repository for training and fine-tuning medium-sized GPT models.
Readable Code: The main code files, train.py and model.py, are both around 300 lines, making them easy to understand and modify.
Training Flexibility: Users can train new models from scratch or fine-tune pre-existing checkpoints.
Compatibility with GPT-2: The repository can optionally load GPT-2 weights from OpenAI.
Dependencies: The project requires several dependencies including PyTorch, numpy, transformers, datasets, tiktoken, wandb, and tqdm.
Quick Start Guide: For beginners, the repository offers a quick start guide to train a character-level GPT on Shakespeare’s works.
Reproducing GPT-2: Advanced users can reproduce GPT-2 results using the provided instructions.
Fine-tuning: The repository provides guidance on how to fine-tune a GPT model on new text datasets.
Sampling/Inference: Users can sample from pre-trained GPT-2 models or models they trained themselves using the sample.py script.
Efficiency Notes: The code uses PyTorch 2.0 by default, which offers noticeable performance improvements.

Additional Features

Baselines: The repository provides baselines for OpenWebText using OpenAI GPT-2 checkpoints.
Efficiency Notes: There are notes on benchmarking and profiling the model for efficiency, highlighting the benefits of using PyTorch 2.0.
To-Do List: The repository lists several tasks and improvements that are yet to be implemented, such as adding FSDP instead of DDP, evaluating zero-shot perplexities on standard evaluations, and incorporating other embeddings.
Troubleshooting: Guidance is provided for users who encounter issues related to PyTorch 2.0 or other aspects of the repository.
Acknowledgements: The repository acknowledges the support of Lambda labs for powering the nanoGPT experiments.

Tags

The simplest, fastest repository for training/finetuning medium-sized GPTs.

About nanoGPT

Features of nanoGPT

Additional Features