Skip to content

Model Support

With the rapid iteration of Intelligent Engine, we have now supported various model inference services. Here, you can see information about the supported models.

  • Intelligent Engine v0.3.0 launched model inference services, facilitating users to directly use the inference services of Intelligent Engine without worrying about model deployment and maintenance for traditional deep learning models.
  • Intelligent Engine v0.6.0 supports the complete version of vLLM inference capabilities, supporting many large language models such as LLama, Qwen, ChatGLM, and more.

Note

The support for inference capabilities is related to the version of Intelligent Engine. Refer to the Release Notes to understand the latest version and update timely.

You can use GPU types that have been verified by DCE 5.0 in Intelligent Engine. For more details, refer to the GPU Support Matrix.

Click to Create

Triton Inference Server

Through the Triton Inference Server, traditional deep learning models can be well supported. Currently, Intelligent Engine supports mainstream inference backend services:

Backend Supported Model Formats Description
pytorch TorchScript, PyTorch 2.0 formats triton-inference-server/pytorch_backend
tensorflow TensorFlow 2.x triton-inference-server/tensorflow_backend
vLLM (Deprecated) TensorFlow 2.x triton-inference-server/tensorflow_backend

Danger

The use of Triton's Backend vLLM method has been deprecated. It is recommended to use the latest support for vLLM to deploy your large language models.

vLLM

With vLLM, we can quickly use large language models. Here, you can see the list of models we support, which generally aligns with the vLLM Support Models.

  • HuggingFace Models: We support most of HuggingFace's models. You can see more models at the HuggingFace Model Hub.
  • The vLLM Supported Models list includes supported large language models and vision-language models.
  • Models fine-tuned using the vLLM support framework.

New Features of vLLM

Currently, Intelligent Engine also supports some new features when using vLLM as an inference tool:

  • Enable Lora Adapter to optimize model inference services during inference.
  • Provide a compatible OpenAPI interface with OpenAI, making it easy for users to switch to local inference services at a low cost and quickly transition.

Comments