About OpenChatKit

OpenChatKit is a robust open-source foundation designed to facilitate the creation of both specialized and general-purpose chatbots for a variety of applications. Developed in collaboration with LAION and Ontocord, OpenChatKit is more than just a model release; it signifies the commencement of an open-source initiative. The project aims to provide a set of tools and processes that will undergo continuous enhancement through community contributions. OpenChatKit 0.15 has been released under the Apache-2.0 license, granting full access to its source code, model weights, and training datasets. The overarching vision is to foster a community-driven project that evolves and flourishes with time.

Features of OpenChatKit

  1. Instruction-Tuned Large Language Model: OpenChatKit is equipped with a large language model fine-tuned for chat from EleutherAI’s GPT-NeoX-20B. This model has been trained with over 43 million instructions on a 100% carbon-negative compute.
  2. Customization Recipes: Users can fine-tune the model to achieve high precision for specific tasks. This customization allows the model to cater to specific applications, enhancing its accuracy and relevance.
  3. Extensible Retrieval System: This feature enables the augmentation of bot responses with information from various sources like document repositories, APIs, or other live-updating information sources during inference.
  4. Moderation Model: A crucial component of OpenChatKit, the moderation model is fine-tuned from GPT-JT-6B. It is designed to filter which questions the bot responds to, ensuring that the interactions remain relevant and appropriate.

Additional Features

  1. Feedback Tools: OpenChatKit includes tools that allow users to provide feedback. This feature enables community members to contribute new datasets, thereby contributing to a growing corpus of open training data that will enhance LLMs over time.
  2. Retrieval Augmented Systems: With the retrieval system, the chatbot can access up-to-date information, providing the necessary context for the model to answer questions accurately.
  3. Moderation Model Classification: The moderation model classifies user questions into various categories, ensuring that the chatbot responds only when the question aligns with allowed classifications.