BLOOM, standing for BigScience Language Open-science Open-access Multilingual, is a groundbreaking project by BigScience that is currently training a model with 176 billion parameters. The training, which started on March 11, 2022, is expected to last 3-4 months and is being conducted on the 416 A100 GPUs of the Jean Zay public supercomputer.

Here are four key features of the BLOOM project

  1. Massive Model Architecture: The model boasts a decoder-only architecture similar to GPT, with 176 billion parameters, 70 layers, 112 attention heads per layer, a hidden dimensionality of 14336, and a sequence length of 2048 tokens. It uses ALiBi positional embeddings and the GeLU activation function.
  2. Multilingual Dataset: The model is trained on a multilingual dataset encompassing 46 languages and 341.6 billion tokens, equivalent to 1.5 TB of text data. The tokenizer vocabulary consists of 250,680 tokens.
  3. Advanced Engineering: The training uses 384 A100 GPUs, each with 80 GB of memory. One copy of the model takes 48 GPUs, using 60 GB of memory on each GPU. The checkpoint size is 329GB for bf16 weights, and the full checkpoint with optimizer states is 2.3TB.
  4. Environmental Considerations: The model is being trained on the Jean Zay supercomputer, which is primarily powered by low-carbon nuclear energy. The heat generated by the hardware is even used for heating buildings on campus, reflecting a commitment to efficiency and environmental sustainability.