Skip to main content
Version: 3.7.0

Assistant XiaoRui Deployment Manual

Prerequisites

  1. The customer is required to provide the API endpoints for calling the LLM. This means the customer can either deploy their own LLM within their own environment or use an LLM provided by cloud vendors (e.g., Volcano Engine, Baidu Cloud Platform, Alibaba Cloud Platform), as long as they can provide the necessary API details such as the endpoint URL and key. These APIs must comply with the OpenAI specification, which is widely adopted by common LLMs domestically and internationally. Below is an example using Volcano Engine's DeepSeek-R1 API, showing the endpoint URL and key.
model.platform=volcengine
model.name=deepseek-r1-250528
model.api.key=abc123(use your own key)
model.base.url=https://ark.cn-beijing.volces.com/api/v3
  1. The customer is also required to provide API endpoints for both the Embedding model and the ReRanker model. The Embedding model is essential for converting text into embedding vectors, while the ReRanker model is used to reorder search results during knowledge base retrieval, thereby improving accuracy. Similarly, for these two models, the customer can either deploy them in their own environment or use the model APIs provided by cloud vendors (e.g., Volcano Engine, Baidu Cloud Platform, Alibaba Cloud Platform). Internally, we use the Qwen3-Embedding-4B model for embeddings and the bge-reranker-v2-m3 model for reranking. Customers may choose their own Embedding and ReRanker models, but please note that using different models may impact the final accuracy.
  2. Two services need to be deployed: the chat-service and the milvus-service. Both services support deployment in Docker environments and Kubernetes. The required resources for chat-service + milvus-service are a total of two servers with 8 vCPUs and 16 GB RAM each. When deploying milvus-service, only a single node is deployed. It is recommended to use SSD storage, as using HDD may negatively affect query performance.

Deployment Steps

  • The deployment process for the chat-service and milvus-service is consistent with the deployment methods of other company components and can be directly deployed using Ansible.
  • Ansible Deployment Steps:
  1. Prepare the Ansible environment and obtain the Ansible package: Contact the Architecture Department or download the Ansible package containing the chat-service and milvus-service components from the resource download platform (e.g., one_chat_service_X.X.X.X_increment.tar.gz).
  2. Upload and extract the package: e.g., tar -xvf one_chat_service_3.3.1.0_increment.tar.gz -C /data/ansible
  3. Modify deployment parameters: Edit the content in hosts.ini to specify the machine IPs for deploying chat-service and milvus-service.
  4. Modify the all.yml file: Set the chat_service version number in /data/ansible/gaea/group_vars/all.yml (it must match the version packaged by the DevOps platform). Sometimes this version is pre-determined and requires no modification.
  5. Execute the deployment command: Navigate to the Ansible bin directory and run the installation command: sh br.sh --install -t chat_service -S -vvv
  6. Verify service status after deployment: docker ps -a | grep chat_service # Confirm the container is running normally.
  • The specific steps may vary depending on the chat-service version. It is necessary to confirm the version and detailed steps with the AI team before deployment.

  • Since LLM root cause analysis relies on data such as call chains, logs, and alerts, it has dependencies on components like APM, RUM, Log, and Alert. Different chat-service versions may depend on different versions of these other components. Dependencies on other components must be confirmed with the AI team.

  • It is recommended to deploy the milvus-service first, followed by the chat-service.

  • Post-Deployment Service Health Check:

    • After deploying milvus-service, access the container using docker exec -it br-milvus-service bash. Inside the container, run the command bash milvus_service.sh 6. If all displayed processes show a status of RUNNING, the service is functioning normally, as shown in the example below:

      1852a9a2288e458f87d9b85164c8f88a.png

    • After deploying chat-service, access the container using docker exec -it br-chat-service bash. Inside the container, run the command bash chat_service.sh 6. If all displayed processes show a status of RUNNING, the service is functioning normally. (The output is similar to the example above).

Configure LLM Parameters Post-Deployment

  • After deploying the chat-service and milvus-service, the first step is to configure the API endpoints for the LLM, Embedding model, and ReRanker model.Configuration Method:Modify the content of the chat-service's private configuration in NACOS (Default: CHAT_SERVICE, which may be subject to change).
  1. Configure the primary LLM API parameters.

    model.platform=volcengine
    model.name=deepseek-r1-250528
    model.api.key=abc123(use your own key)
    model.base.url=https://ark.cn-beijing.volces.com/api/v3
  2. Configure the secondary LLM API parameters: (The secondary LLM is typically a faster alternative model. If unavailable, it can be set to the same as the primary LLM above.)

    # Secondary LLM Configuration
    sub.model.platform=volcengine
    sub.model.name=doubao-seed-1-6-flash
    sub.model.api.key=(use your own key)
    sub.model.base.url=https://ark.cn-beijing.volces.com/api/v3

    # Third-party LLM Configuration (If unavailable, it can be configured to be the same as the primary LLM above.)
    third.model.platform=volcengine
    third.model.name=doubao-seed-1-6-flash
    third.model.api.key=(use your own key)
    third.model.base.url=https://ark.cn-beijing.volces.com/api/v3
  3. Configure the API endpoints for the Embedding model and ReRanker model.

    # Embedding Model Configuration
    embedding.model.name=br-embedding(use your own name)
    embedding.model.key=(use your own key)
    embedding.model.url=http://ip:port/v1(use your own url)

    # Rerank Model Configuration
    rerank.model.name=br-rerank(use your own name)
    rerank.model.key=(use your own key)
    rerank.model.url=http://ip:port(use your own url)
  • After the configuration is complete, restart the service using the command: docker restart br-chat-service.