Skip to main content
Version: 3.7.0

Swift AI

Swift AI is an AI algorithm engine provided by Bonree, designed for efficient anomaly detection, trend prediction, alert convergence, root cause analysis, and the creation of multi-scenario agents integrated with LLM in the context of operational data.

Anomaly Detection

Swift AI performs real-time training on historical time-series data to automatically generate dynamic baselines that meet specific conditions. Anomaly detection for metrics is a common application in operations, covering fundamental machine performance indicators such as CPU utilization, memory usage, and disk usage, as well as application-level golden signals like transaction volume, response time, and success rate. These detection methods can be broadly categorized into:

  • Statistical methods

  • Machine learning-based approaches (e.g., clustering, classification, outlier detection)

  • Deep learning algorithms (e.g., autoencoders, variational autoencoders)

    These algorithms help detect anomalous behavior in systems and applications, enabling operations teams to quickly identify and resolve issues.

Trend Prediction

Trend prediction is a method for forecasting the future trajectory of data metrics, applicable across various scenarios in operations. For example:

  • Predicting potential failure trends by analyzing operational metrics of devices or systems allows for proactive maintenance, reducing downtime and costs.
  • Forecasting future resource requirements based on usage trends helps in capacity planning and optimization, avoiding resource bottlenecks or overinvestment.

Alert Convergence

Alert convergence is a technique for managing and reducing alert noise. In complex IT environments, issues often trigger a large number of alerts from different components, systems, or sensors. The goal of alert convergence is to identify and aggregate related alerts, enabling operations teams to handle and respond to problems more effectively. By merging similar or related alerts into a consolidated notification, redundancy and repetition are minimized, providing a clearer and more concise view for faster issue resolution.

Root Cause Analysis

Root cause analysis involves investigating and accurately determining the fundamental cause of system failures. The primary goals are to restore normal system operation quickly and minimize the impact on business and users. Algorithms for root cause localization can be categorized as follows:

  • Rule-based methods: Use predefined rules and constraints to monitor system states; violations indicate potential root causes.
  • Statistical analysis-based methods: Collect, store, and analyze system metrics to identify distributions, trends, and periodic patterns, pinpointing the root cause.
  • Machine learning-based methods: Leverage AI techniques such as neural networks, decision trees, and support vector machines to model and analyze historical failure data, inferring root causes.
  • Knowledge graph-based methods: Construct fault knowledge graphs that record various failure manifestations and causes; analysis and querying of these graphs enable rapid root cause identification.

LLM Agents

Large Language Models (LLMs) excel at natural language understanding and reasoning. Integrating LLMs into operations significantly enhances efficiency. Bonree provides two LLM-based agents:

  • Knowledge Q&A Agent "Xiaorui Assistant": Explains Bonree ONE features, intelligently aware the current environment, and assists users in quickly generating PromQL expressions.
  • Root Cause Analysis Agent: Rapidly analyzes observable signals related to alerts, leverages knowledge bases for analysis and localization, and provides possible root cause conclusions within minutes.