About GPT-Code-Clippy (GPT-CC)

GPT-Code-Clippy (GPT-CC) is an open-source version of GitHub Copilot, a language model based on GPT-3, called GPT-Codex, that is fine-tuned on publicly available code from GitHub. It aims to assist developers in writing code more efficiently by providing relevant suggestions.

Here are four key features of GPT-Code-Clippy

  1. Data-Driven: GPT-CC is trained on a dataset obtained from SEART GitHub Search, which includes repositories with more than 10 GitHub stars, more than 2 commits, a license, and excluding forks. This dataset is combined with all of the GitHub repositories contained in The Pile, ensuring a broad and diverse range of coding examples.
  2. Fine-Tuned Models: GPT-CC includes fine-tuned versions of GPT-2 and GPT-Neo. These models are available on the HuggingFace platform and have shown promising results, particularly for APPs specific tasks.
  3. Training and Evaluation: GPT-CC provides detailed training scripts and evaluation methods. It uses different optimizers and learning rate schedules for fine-tuning on different datasets. The models are evaluated on the APPS and HumanEval datasets.
  4. Demo Available: A Visual Studio Code demo that uses the HuggingFace Inference API is available. There’s also a Huggingface’s Space demo where you can specify a problem in the format of a programming competition question.