About Microsoft JARVIS

Microsoft’s JARVIS is a system designed to bridge Large Language Models (LLMs) with the broader Machine Learning community. The primary objective of JARVIS is to utilize language as an interface, allowing LLMs to connect with a multitude of AI models to address complex AI tasks. The system is collaborative in nature, with an LLM acting as the controller and several expert models functioning as collaborative executors. The entire workflow of JARVIS is divided into four stages: Task Planning, Model Selection, Task Execution, and Response Generation. For a detailed understanding, a paper titled “HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace” has been published, which can be accessed here.

Features of Microsoft JARVIS

  1. Task Planning: JARVIS employs ChatGPT to analyze user requests, comprehend their intentions, and break them down into potentially solvable tasks.
  2. Model Selection: Based on the planned tasks, ChatGPT selects expert models from Hugging Face, relying on their descriptions.
  3. Task Execution: The system invokes and runs each chosen model, subsequently returning the results to ChatGPT.
  4. Response Generation: ChatGPT integrates the predictions from all models to generate comprehensive responses.
  5. System Requirements: JARVIS has specific system requirements, including Ubuntu 16.04 LTS, VRAM >= 24GB, and varying RAM requirements based on the configuration (minimal, standard, full).
  6. Quick Start: Users can quickly set up JARVIS by replacing the OpenAI key and Hugging Face token, followed by executing a series of commands to activate the server and access JARVIS services via the Web API.
  7. Configuration: JARVIS offers flexibility in configuration, allowing users to choose between local, HuggingFace, or hybrid inference modes. It also provides options for the scale of locally deployed models.

Additional Features

  • NVIDIA Jetson Embedded Device Support: JARVIS includes a Dockerfile that offers experimental support for NVIDIA Jetson embedded devices. This feature ensures accelerated ffmpeg, pytorch, torchaudio, and torchvision dependencies.
  • Citation: For those who find JARVIS beneficial for their research or projects, a citation format is provided to credit the original authors.
  • Acknowledgment: JARVIS acknowledges the contributions of ChatGPT, Hugging Face, ControlNet, and ChatGPT-vue.