Large language models, particularly those with over 100 billion parameters, have revolutionized the field of natural language processing (NLP) in recent years. These models can generate creative text, solve math problems, answer comprehension questions, and more. However, access to these models has been limited, mainly to well-funded labs. To democratize access and promote open science, Meta AI has introduced the Open Pretrained Transformer (OPT-175B), a language model with 175 billion parameters. This release is unique as it not only provides the pretrained models but also the necessary code to train and use them. The model is released under a noncommercial license, emphasizing research use cases.
Features of OPT-175B
- Large-Scale Model: OPT-175B is a massive language model with 175 billion parameters, trained on publicly available datasets.
- Open Access: In a move towards open science, the pretrained models and the code required to train and utilize them are shared with the community.
- Responsible Publication: Adhering to guidelines from the Partnership on AI and NIST, the development process, including the day-to-day training process, is documented and shared. This transparency allows other researchers to build upon the work more easily.
- Efficient Training: The model was trained using only 16 NVIDIA V100 GPUs, making it more accessible for research purposes. Additionally, smaller-scale baseline models have been released to study the effects of scale.
- Energy Efficiency: OPT-175B was developed with a focus on energy efficiency. It was trained using only 1/7th of the carbon footprint of GPT-3, thanks to Meta’s open-source Fully Sharded Data Parallel (FSDP) API and NVIDIA’s tensor parallel abstraction within Megatron-LM.
- Collaborative Research: Meta AI emphasizes the importance of collaboration across research organizations for the responsible development of AI technologies. With the release of OPT-175B, the aim is to bring more voices to the forefront of large language model creation and ensure transparency in the process.
- Diverse Model Sizes: Apart from OPT-175B, the GitHub repository provides details on various other models ranging from OPT-125M to OPT-66B, allowing researchers to choose based on their requirements.
- Model & Data Cards: For transparency and accountability in model development, both model and data cards are provided. These cards offer insights into the model’s design, training data, potential biases, and more.
- Licensing: The use of OPT model weights is governed by a specific Model License, ensuring that the models are used responsibly and ethically.