About Constitutional AI

Constitutional AI is a novel approach to training large language models (LLMs) that aims to govern their behavior through a set of predefined principles. This method, as discussed in the article on Medium, is used to train Claude, a model developed by Anthropic. The concept of Constitutional AI is designed to tackle some of the most challenging issues in natural language processing, such as generating accurate information, avoiding biased or harmful content, and providing citations for generated information.

Here are four key features of Constitutional AI

  1. Explanation of Refusals: Constitutional AI allows a model to explain why it is refusing to provide an answer. This can offer insights into the model’s reasoning and improve user understanding.
  2. Reinforcement Learning from AI Feedback (RLAIF): In training Claude, the Anthropic team used AI-generated preferences, reducing the amount of human effort required. This concept is known as Reinforcement Learning from AI Feedback (RLAIF).
  3. Self-Critique and Revision: The model can critique its own generation based on a set of provided principles. It then uses this critique to revise its previous response to align with the provided principles.
  4. Scaling Supervision: Constitutional AI can supervise other AI systems, making it possible for another AI to supervise an LLM for every response it generates. This idea is referred to as the ability to scale supervision.