MusicGen is a part of the Audiocraft library, a PyTorch-based tool for deep learning research on audio generation. Developed by Facebook Research, MusicGen is a state-of-the-art controllable text-to-music model that doesn’t require a self-supervised semantic representation and generates all 4 codebooks in one pass.
Here are four key features of MusicGen
- Controllable Text-to-Music Model: MusicGen is a single-stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. It allows for the generation of music based on textual input.
- Efficient Generation: By introducing a small delay between the codebooks, MusicGen can predict them in parallel, thus having only 50 auto-regressive steps per second of audio.
- Large Training Dataset: MusicGen is trained on 20K hours of licensed music, including an internal dataset of 10K high-quality music tracks, and the ShutterStock and Pond5 music data.
- Integration with Transformers Library: From version 4.31.0 onwards, MusicGen is available in the Transformers library, making it easier to generate text-conditional audio samples with minimal dependencies and additional packages.