GPT-4chan is a unique language model that has been fine-tuned from GPT-J 6B, using data from 4chan’s politically incorrect (/pol/) board. Over the course of three and a half years, the creator analyzed more than 134.5 million postings on /pol/, incorporating the thread structure of the board into the program. This resulted in an AI capable of posting to /pol/ in a manner similar to a human user.
Here are four key features of GPT-4chan
- Data Source: The model was trained on a dataset called “Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board.”
- Training Procedure: The model was trained for one epoch following GPT-J’s fine-tuning guide.
- Intended Use: GPT-4chan is designed to reproduce text according to the distribution of its input data, making it a useful tool for investigating discourse in anonymous online communities.
- Potential Applications: Beyond its primary function, GPT-4chan also shows promise in tasks such as toxicity detection, as initial experiments have shown promising zero-shot results when comparing a string’s likelihood under GPT-4chan to its likelihood under GPT-J 6B.