Features list¶
This page lists the features supported by Observability Insight.
Community Edition - Observability¶
DCE 5.0 Community Edition provides the following observable features.
Category | Subcategory | Description |
---|---|---|
Resource monitoring | Multi-cluster monitoring | Provide multi-cluster business centralized observability The administrator manages multi-cluster alarms in a unified manner, and satisfies cluster and tenant administrator data isolation Supports persistent cluster indicators and log data. |
Scenario monitoring | Provides a monitoring overview of a single cluster, allowing you to view the running status of the cluster, understand the resource usage of the cluster, and the current alarms that are occurring in the cluster | |
Node monitoring | Support to view the running status of the node, etc., and understand the changes in the CPU, memory, network and other resources of the node | |
Container Monitoring | Supports monitoring of resources such as stateless loads, daemon processes, container groups, etc., can monitor the running status of the workload, and can view the number of alarms and the trend chart of resource consumption such as CPU and memory | |
Dashboard | Platform Component Monitoring | Provide open-source selected dashboards through native Grafana, and provide built-in dashboards to support monitoring etcd, APIServer and other components |
Cluster Resource Monitoring | Provides multi-dimensional monitoring of clusters, nodes, and namespaces. The data source used by Grafana supports viewing data from multiple clusters. | |
Data Query | Index Query | Common Query pre-orders basic indicators, and after selecting query conditions such as cluster, type, node, and indicator name, you can query the change trend of resources Support querying indicator charts and data details through native PromQL statements |
Log query | You can query the logs of Node, Pod, Depoyment, Statefulset, etc., and you can query the context content of a single log Support searching by keyword Sort by time by default, and you can query the number of logs through the histogram Support querying detailed information and context of a single log | |
Log Download | Support to download logs within a period of time according to search criteria Support exporting the content of a single log context | |
Alarm Center | Active Alarm | Provide a histogram to view the change trend of the alarm time Support to view all the rules and details that are alarming |
Historical alarms | You can query all alarms after automatic recovery or manual resolution | |
Alert rules | Built-in 100+ alert rules, providing predefined alert rules for cluster components, container resources, etc. Administrators can create global alert rules to provide unified alerts for clusters that have installed insight-agent Support creating alarm rules through predefined indicators Support creating alarm rules by writing PromQL statements Support custom thresholds, durations and notification methods You can customize the level of alarms, support emergency, warning , Prompt three levels | |
Notification configuration | On the notification configuration page, you can configure to send messages to users through email groups, corporate WeChat, DingTalk, Webhook, etc. Support simultaneous notification to multiple alarm objects | |
Message template | The message template function supports customizing the content of the message template, and can notify the specified object in the form of email, corporate WeChat, DingTalk, and Webhook | |
Log collection and query | Unified log collection | Unified collection of log data of nodes, containers, containers, and k8s events Collect the audit operation of the global management platform, and the collection of k8s audit logs is not enabled by default |
Log persistent storage | Logs can be marked and output to middleware such as Elasticsearch for persistence | |
Metric collection | Metric data collection | Support to use ServiceMonitor to define the namespace scope of Pod discovery and select the listening Service through matchLabel |
System configuration | System configuration | System configuration displays the default storage time of indicators, logs, and links and the default Apdex threshold Support custom modification of the storage time of indicators, logs, and link data |
Commercial Edition - Observability¶
On the basis of the community edition, the commercial edition of DCE 5.0 provides more abundant and customizable observable features.
Category | Subcategory | Description |
---|---|---|
Resource monitoring | Multi-cluster monitoring | Provide multi-cluster business centralized observability The administrator manages multi-cluster alarms in a unified manner, and satisfies cluster and tenant administrator data isolation Supports persistent cluster indicators and log data. |
Cluster Monitoring | Provides an overview of the monitoring of a single cluster, allowing you to view the running status of the cluster, understand the resource usage of the cluster, and the alarms that are currently occurring in the cluster | |
Node monitoring | Support to view the running status of the node, etc., and understand the changes in the CPU, memory, network and other resources of the node | |
Container Monitoring | Supports monitoring of resources such as stateless loads, daemon processes, container groups, etc., can monitor the running status of the workload, and can view the number of alarms and the trend chart of resource consumption such as CPU and memory | |
Scenario Monitoring | Service Monitoring1 | You can view key indicators such as real-time throughput, number of requests, request delay and error rate of the service, as well as the trend of change over a period of time You can view the service's real-time performance over a period of time Requests, as well as the trend of real-time throughput, number of requests, request delay and error rate of a single request |
Topology map1 | The administrator can view the call relationship and health status between services connected to the observation platform and link collection, and quickly locate faults You can view the traffic direction and key indicators requested between services You can quickly view the real-time throughput, number of requests, request latency and error rate of a single service | |
Dashboard | Platform Component Monitoring | Provide open-source selected dashboards through native Grafana, and provide built-in dashboards to support monitoring etcd, APIServer and other components |
Cluster Resource Monitoring | Provides multi-dimensional monitoring of clusters, nodes, and namespaces. The data source used by Grafana supports viewing data from multiple clusters. | |
Data Query | Index Query | Common Query pre-orders basic indicators, and after selecting query conditions such as cluster, type, node, and indicator name, you can query the change trend of resources Support querying indicator charts and data details through native PromQL statements |
Log query | You can query the logs of Node, Pod, Depoyment, Statefulset, etc., and you can query the context content of a single log Support searching by keyword Sort by time by default, and you can query the number of logs through the histogram Support querying detailed information and context of a single log | |
Log Download | Support to download logs within a period of time according to search criteria Support exporting the content of a single log context | |
Link query1 | Through link query, you can view all the requests of the service within a certain period of time, support configuring clusters, namespaces, services, operations, tags, and then click Search for precise search Supports viewing a single Requested aggregated link graph for fast fault location | |
Alarm Center | Active Alarm | Provide a histogram to view the change trend of the alarm time Support to view all the rules and details that are alarming |
Historical alarms | You can query all alarms after automatic recovery or manual resolution | |
Alert rules | Built-in 100+ alert rules, providing predefined alert rules for cluster components, container resources, etc. Administrators can create global alert rules to provide unified alerts for clusters that have installed insight-agent Support creating alarm rules through predefined indicators Support creating alarm rules by writing PromQL statements Support custom thresholds, durations and notification methods You can customize the level of alarms, support emergency, warning , Prompt three levels | |
Notification configuration | On the notification configuration page, you can configure to send messages to users through email groups, corporate WeChat, DingTalk, Webhook, etc. Support simultaneous notification to multiple alarm objects | |
Message template | The message template function supports customizing the content of the message template, and can notify the specified object in the form of email, corporate WeChat, DingTalk, and Webhook | |
Log collection and query | Unified log collection | Unified collection of log data of nodes, containers, containers, and k8s events Collect the audit operation of the global management platform, and the collection of k8s audit logs is not enabled by default |
Persistent storage of logs | Logs can be marked and output to middleware such as Elasticsearch for persistence | |
Metric collection | Metric data collection | Support to use ServiceMonitor to define the namespace scope of Pod discovery and select the monitored Service through matchLabel |
Component status1 | Support to view the status of the container group of the collection component, and jump to the corresponding container group details | |
Link Collection1 | Link Data Collection | Support link data collection by using OTEL SDK in a non-intrusive or less intrusive way Support link collection by injecting Sidecar into grid applications data |
System configuration | System configuration | System configuration displays the default storage time of indicators, logs, and links and the default Apdex threshold Support custom modification of the storage time of indicators, logs, and link data |