About PLATO-XL by Baidu
Baidu has unveiled PLATO-XL, a groundbreaking dialogue generation model with up to 11 billion parameters. This model is designed to enhance the quality of open-domain dialogue systems, aiming to make AI bots more coherent, informative, and engaging in conversations. With the introduction of PLATO-XL, Baidu has achieved significant advancements in both Chinese and English conversations.
Features of PLATO-XL
- Unified Transformer Architecture: PLATO-XL employs a unified transformer architecture that enables simultaneous modeling of dialogue understanding and response generation. This architecture is more parameter-efficient and incorporates a flexible self-attention mask for bidirectional encoding of dialogue history and unidirectional decoding of responses. This design improves training efficiency, especially given the variable lengths of conversation samples.
- Multi-party Aware Pre-training: To address the inconsistency in multi-turn conversations, PLATO-XL introduces multi-party aware pre-training. The majority of the pre-training data is sourced from social media, where multiple users exchange ideas. This feature helps the model distinguish information from various participants, ensuring consistent dialogue generation.
- Dual Dialogue Models: The 11 billion parameter PLATO-XL comprises two dialogue models, one each for Chinese and English. It utilizes 100 billion tokens of data for pre-training and is implemented on Baidu’s deep learning platform, PaddlePaddle.
- Advanced Training Techniques: To handle the vastness of the model, PLATO-XL employs gradient checkpoint and sharded data parallelism techniques provided by FleetX, PaddlePaddle’s distributed training library. The model is trained using a high-performance GPU cluster.
- Superior Performance: PLATO-XL has demonstrated superior performance in various conversational tasks compared to other open-source dialogue models. It excels in open-domain conversation, knowledge-grounded dialogue, and task-oriented conversation.
- Broad Range of Model Sizes: The PLATO series offers dialogue models ranging from 93M to 11B parameters, indicating a positive correlation between model size and performance improvement.
- Logical and Informative Conversations: PLATO-XL can engage in logical, informative, and intriguing multi-turn conversations in both English and Chinese.
- Pushing Boundaries in Open-domain Conversations: As one of the most challenging tasks in natural language processing, PLATO-XL sets new benchmarks in conversation consistency and factuality, moving closer to human-like learning and chatting capabilities.
- Addressing Limitations: Baidu acknowledges the existing limitations of dialogue generation models, such as biases and misleading information. Efforts are ongoing to enhance conversation quality, focusing on fairness and factuality.
- Open Source Release: Baidu plans to release the source code along with the English model on GitHub, aiming to advance research in dialogue generation.