Application Introduction
Prometheus is a free software application used for event monitoring and alerting. It records real-time metrics in a time series database built using a HTTP pull model, with flexible queries and real-time alerting. (Source: Wikipedia)
The main positive aspects of this software are that it is able to store metrics very efficiently and that it is very easy to run and maintain, even in large deployments. With the new PromQL language, it is in addition equally powerful as InfluxDB.
Prometheus is not meant for long-term storage though and other projects are able to collect the Prometheus data via the so-called remote-writer functionality.
Prometheus Deployment Options and Trade-offs
By default, OpenShift deploys two Prometheus instances, which work independently of each other and are based on EmptyDir volumes. Therefore theoretically data is not lost when one Prometheus instance is down, since it can be queried from the other instance.
The EmptyDir volumes store data for the Pod for as long as it exists on the local disk of the node. If the Pod is deleted or the Node is lost, the collected metrics are lost with it and Prometheus will start with an empty data set. It is possible to change the backend volume to a memory-based EmptyDir, which theoretically improves performance, but costs a lot more and data is lost on every OpenShift node reboot. What we saw in this test is that the performance of the memory-based EmptyDir is the same as the regular EmptyDir. The other alternative is to base the Prometheus Time Series Database (TSDB) volume on OCS-backed storage, which has similar performance characteristics as the regular EmptyDir, while the resilience is improved.
Simple Query |
Query with one PromQL Function |
Multiple PromQL Functions |
Summary |
|
Test configuration |
1000 queries, 100 in parallel |
100 queries, 10 in parallel |
600 queries, 100 in parallel |
|
OpenShift Container Storage |
Requests/sec: 65.22 Mean time per request: 15.33 ms |
Requests/sec: 1.65 Mean time per request: 604.56 ms |
Requests/sec: 0.77 Mean time per request: 1,294.13 ms |
Performance: 👍 Resilience: 👍👍👍 Cost: 👍 |
EmptyDir |
Requests/sec: 71.41 Mean time per request: 14 ms |
Requests/sec: 1.76 Mean time per request: 569.56 ms |
Requests/sec: 0.86 Mean time per request: 1,165.30 ms |
Performance: 👍👍 Resilience: 👍 Cost: 👍👍 |
EmptyDir based on ramdisk |
Requests/sec: 70.68 Mean time per request: 14.15 ms |
Requests/sec: 1.69 Mean time per request: 590.02 ms |
Requests/sec: 0.83 Mean time per request: 1,209.35 ms |
Performance: 👍👍 Resilience: 👎 Cost: 👎 |
Key Measures of Perf and Resilience for Prometheus
We captured the following key measures of performance and resilience to inform this brief:
- Query performance with a simple query of node_load1
- Query performance with one PromQL function
- Query performance with a complex interaction of multiple PromQL functions
Workload Benchmarking Results Summary
Key observations of Prometheus performance.
Appendix
Benchmark Overview
For the automatic provisioning of Prometheus we used three different deployments that each deploy a single Prometheus instance with the different storage backends.
To measure the performance of the Prometheus TSDB, we used the ApacheBench software. This software was developed to measure the performance of websites and has many useful features for us. Since Prometheus has an HTTP API, we point ApacheBench to prepared URLs which trigger a TSDB lookup. ApacheBench will then tell us how long each lookup took. To decrease any effects by networking, we ran ApacheBench on Pods in the same OpenShift cluster and connected to Prometheus via the OpenShift service address.
For every database there are some queries that are simpler to run and some that are harder to run. For our test run we prepared three different queries and asked Prometheus to give us all metrics in a 9 day time window. We had to increase the sample size to 10 minutes to be below the Prometheus data point limit that a single query could return. Instead, we increased the total count of queries and the number of parallel queries to stress Prometheus and the underlying storage enough.
Benchmark Environment Summary
Software
OCP Version |
v4.2 |
OCP Infra |
VMware |
Master Nodes |
3 x |
Compute nodes |
3 x 16 vCPU & 64GB RAM |
OCS Storage Nodes |
3 x 16vCPU & 64GB RAM |
OCS Storage Devices |
3 x 1 TB vSAN based PVCs on NVMes |
OCS Version |
v4.2 |
Table 1 : OCP and OCS Infra Details
Prometheus version |
2.14.0 (Container image prom/prometheus:latest) |
ApacheBench version |
2.3 (Container image jordi/ab) |
Table 2: Deployed versions details
Measurements:
Raw material available here: https://gist.github.com/mulbc/33d25cfd3b31fff307c7ce23352f1efd
Additional Resources
- OpenShift Container Storage: openshift.com/storage
- OpenShift | Storage YouTube Playlist
- OpenShift Commons ‘All Things Data’ YouTube Playlist
Feedback
To find out more about OpenShift Container Storage or to take a test drive, visit https://www.openshift.com/products/container-storage/.
If you would like to learn more about what the OpenShift Container Storage team is up to or provide feedback on any of the new 4.3 features, take this brief 3-minute survey.
About the author
More like this
Browse by channel
Automation
The latest on IT automation that spans tech, teams, and environments
Artificial intelligence
Explore the platforms and partners building a faster path for AI
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
Explore how we reduce risks across environments and technologies
Edge computing
Updates on the solutions that simplify infrastructure at the edge
Infrastructure
Stay up to date on the world’s leading enterprise Linux platform
Applications
The latest on our solutions to the toughest application challenges
Original shows
Entertaining stories from the makers and leaders in enterprise tech
Products
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Cloud services
- See all products
Tools
- Training and certification
- My account
- Developer resources
- Customer support
- Red Hat value calculator
- Red Hat Ecosystem Catalog
- Find a partner
Try, buy, & sell
Communicate
About Red Hat
We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.
Select a language
Red Hat legal and privacy links
- About Red Hat
- Jobs
- Events
- Locations
- Contact Red Hat
- Red Hat Blog
- Diversity, equity, and inclusion
- Cool Stuff Store
- Red Hat Summit