Features list¶
This page lists the features supported by Observability Insight.
Community Release - Observability¶
DCE 5.0 Community Release provides the following observable features.
Category | Subcategory | Description |
---|---|---|
Resource monitoring | Multi-cluster monitoring | Provide multi-cluster business centralized observability The administrator manages multi-cluster alerts in a unified manner, and satisfies cluster and tenant administrator data isolation Supports persistent cluster metrics and log data. |
Scenario monitoring | Provides a monitoring overview of a single cluster, allowing you to view the running status of the cluster, understand the resource usage of the cluster, and the current alerts that are occurring in the cluster | |
Node monitoring | Support to view the running status of the node, etc., and understand the changes in the CPU, memory, network and other resources of the node | |
Container Monitoring | Supports monitoring of resources such as stateless loads, daemon processes, pods, etc., can monitor the running status of the workload, and can view the number of alerts and the trend chart of resource consumption such as CPU and memory | |
Dashboard | Platform Component Monitoring | Provide open-source selected dashboards through native Grafana, and provide built-in dashboards to support monitoring etcd, APIServer and other components |
Cluster Resource Monitoring | Provides multi-dimensional monitoring of clusters, nodes, and namespaces. The data source used by Grafana supports viewing data from multiple clusters. | |
Data Query | Index Query | Common Query pre-orders basic metrics, and after selecting query conditions such as cluster, type, node, and metric name, you can query the change trend of resources Support querying metric charts and data details through native PromQL statements |
Log query | You can query the logs of Node, Pod, Depoyment, Statefulset, etc., and you can query the context content of a single log Support searching by keyword Sort by time by default, and you can query the number of logs through the histogram Support querying detailed information and context of a single log | |
Log Download | Support to download logs within a period of time according to search criteria Support exporting the content of a single log context | |
alert Center | Active alert | Provide a histogram to view the change trend of the alert time Support to view all the rules and details that are alerting |
Historical alerts | You can query all alerts after automatic recovery or manual resolution | |
Alert rules | Built-in 100+ alert rules, providing predefined alert rules for cluster components, container resources, etc. Administrators can create global alert rules to provide unified alerts for clusters that have installed insight-agent Support creating alert rules through predefined metrics Support creating alert rules by writing PromQL statements Support custom thresholds, durations and notification methods You can customize the level of alerts, support emergency, warning, Prompt three levels | |
Notification configuration | On the notification configuration page, you can configure to send messages to users through email groups, corporate WeChat, DingTalk, Webhook, etc. Support simultaneous notification to multiple alert objects | |
Message template | The message template function supports customizing the content of the message template, and can notify the specified object in the form of email, corporate WeChat, DingTalk, and Webhook | |
Log collection and query | Unified log collection | Unified collection of log data of nodes, containers, containers, and k8s events Collect the audit operation of the global management platform, and the collection of k8s audit logs is not enabled by default |
Log persistent storage | Logs can be marked and output to middleware such as Elasticsearch for persistence | |
Metric collection | Metric data collection | Support to use ServiceMonitor to define the namespace scope of Pod discovery and select the listening Service through matchLabel |
System configuration | System configuration | System configuration displays the default storage time of metrics, logs, and traces and the default Apdex threshold Support custom modification of the storage time of metrics, logs, and link data |
Commercial Release - Observability¶
On the basis of the community release, the commercial release of DCE 5.0 provides more abundant and customizable observable features.
Category | Subcategory | Description |
---|---|---|
Resource monitoring | Multi-cluster monitoring | Provide multi-cluster business centralized observability The administrator manages multi-cluster alerts in a unified manner, and satisfies cluster and tenant administrator data isolation Supports persistent cluster metrics and log data. |
Cluster Monitoring | Provides an overview of the monitoring of a single cluster, allowing you to view the running status of the cluster, understand the resource usage of the cluster, and the alerts that are currently occurring in the cluster | |
Node monitoring | Support to view the running status of the node, etc., and understand the changes in the CPU, memory, network and other resources of the node | |
Container Monitoring | Supports monitoring of resources such as stateless loads, daemon processes, pods, etc., can monitor the running status of the workload, and can view the number of alerts and the trend chart of resource consumption such as CPU and memory | |
Scenario Monitoring | Service Monitoring1 | You can view key metrics such as real-time throughput, number of requests, request delay and error rate of the service, as well as the trend of change over a period of time You can view the service's real-time performance over a period of time Requests, as well as the trend of real-time throughput, number of requests, request delay and error rate of a single request |
Topology map1 | The administrator can view the call relationship and health status between services connected to the observation platform and link collection, and quickly locate faults You can view the traffic direction and key metrics requested between services You can quickly view the real-time throughput, number of requests, request latency and error rate of a single service | |
Dashboard | Platform Component Monitoring | Provide open-source selected dashboards through native Grafana, and provide built-in dashboards to support monitoring etcd, APIServer and other components |
Cluster Resource Monitoring | Provides multi-dimensional monitoring of clusters, nodes, and namespaces. The data source used by Grafana supports viewing data from multiple clusters. | |
Data Query | Index Query | Common Query pre-orders basic metrics, and after selecting query conditions such as cluster, type, node, and metric name, you can query the change trend of resources Support querying metric charts and data details through native PromQL statements |
Log query | You can query the logs of Node, Pod, Depoyment, Statefulset, etc., and you can query the context content of a single log Support searching by keyword Sort by time by default, and you can query the number of logs through the histogram Support querying detailed information and context of a single log | |
Log Download | Support to download logs within a period of time according to search criteria Support exporting the content of a single log context | |
trace query1 | Through trace query, you can view all the requests of the service within a certain period of time, support configuring clusters, namespaces, services, operations, tags, and then click Search for precise search Supports viewing a single Requested aggregated link graph for fast fault location | |
alert Center | Active alert | Provide a histogram to view the change trend of the alert time Support to view all the rules and details that are alerting |
Historical alerts | You can query all alerts after automatic recovery or manual resolution | |
Alert rules | Built-in 100+ alert rules, providing predefined alert rules for cluster components, container resources, etc. Administrators can create global alert rules to provide unified alerts for clusters that have installed insight-agent Support creating alert rules through predefined metrics Support creating alert rules by writing PromQL statements Support custom thresholds, durations and notification methods You can customize the level of alerts, support emergency, warning, Prompt three levels | |
Notification configuration | On the notification configuration page, you can configure to send messages to users through email groups, corporate WeChat, DingTalk, Webhook, etc. Support simultaneous notification to multiple alert objects | |
Message template | The message template function supports customizing the content of the message template, and can notify the specified object in the form of email, corporate WeChat, DingTalk, and Webhook | |
Log collection and query | Unified log collection | Unified collection of log data of nodes, containers, containers, and k8s events Collect the audit operation of the global management platform, and the collection of k8s audit logs is not enabled by default |
Persistent storage of logs | Logs can be marked and output to middleware such as Elasticsearch for persistence | |
Metric collection | Metric data collection | Support to use ServiceMonitor to define the namespace scope of Pod discovery and select the monitored Service through matchLabel |
Component status1 | Support to view the status of the pod of the collection component, and jump to the corresponding pod details | |
Link Collection1 | Link Data Collection | Support link data collection by using OTEL SDK in a non-intrusive or less intrusive way Support link collection by injecting Sidecar into mesh applications data |
System configuration | System configuration | System configuration displays the default storage time of metrics, logs, and traces and the default Apdex threshold Support custom modification of the storage time of metrics, logs, and link data |