Prometheus in Production

Verified Sources

Jun 23, 2026

Running Prometheus in production is fundamentally different from running it in development. At scale, you face challenges around high availability, cardinality management, long-term storage, and operational resilience that require deliberate architectural decisions.

Prometheus was designed with a philosophy of locality and simplicity: each Prometheus server scrapes metrics from targets it can reach directly, stores data locally in a TSDB, and evaluates alerts and recording rules independently. This design trades distributed consensus for operational simplicity — but it also means that scaling beyond a single instance requires you to extend the architecture thoughtfully.

The diagram above illustrates a typical production topology: shard by function, send to long-term storage via remote write, and optionally federate for cross-cluster visibility. Each decision in this architecture — sharding strategy, storage backend, HA approach — comes with trade-offs that we'll explore in depth 2.

How to Build a Scalable Prometheus Architecture - Logz.io guide on scaling Prometheus with federation and remote write ↩
Prometheus Federation Explained: Architecture & Pitfalls - Groundcover deep dive on federation architecture and challenges ↩

Scaling Stages: Where Are You?

Not every organization needs the same Prometheus architecture. Your approach should match your scale:

Active Series	Typical Architecture	Key Challenges
< 100K	Single Prometheus instance	Minimal — focus on alerting quality
100K – 1M	Single instance + vertical scaling	Memory pressure, storage growth
1M – 5M	Functional sharding + remote write	Cardinality explosions, query performance
5M – 50M	Multiple shards + LTS backend (Thanos/Mimir)	Cross-shard queries, deduplication
50M+	Full distributed stack (Mimir/Thanos/Cortex)	Multi-tenancy, cost management, ingestion reliability

At the 1–5 million active series mark, a single instance becomes viable only with aggressive optimization — functional sharding (one instance for infrastructure, one for application metrics, one for business metrics) and remote write to object storage for long-term retention .

Prometheus Scalability: High Cardinality And How To Fix It - Scaling stages from 100K to 50M+ active series ↩

Setting Up a Production-Grade Prometheus Architecture

1
Step 1
Decompose your monitoring workload into independent Prometheus instances. Common sharding strategies include:

Infrastructure metrics: node_exporter, kube-state-metrics, cadvisor

Application metrics: custom application metrics from instrumented services

Business metrics: SLI/SLO calculations, business KPIs

This provides horizontal scaling via functional decomposition and limits the blast radius of any single instance failing.
2
Step 2
Run two identical Prometheus replicas scraping the same targets. Use external_labels to identify each replica. This ensures that if one replica fails, the other continues collecting and evaluating alerts. The key challenge is deduplication at query time — handle this via:

Thanos: built-in deduplication via --deduplication.replica-label

Mimir: automatic deduplication based on replica labels

Cortex: configurable deduplication

Note: both replicas will send alerts, so configure Alertmanager with group_by and inhibition rules to avoid duplicate notifications.
3
Step 3
Configure remote_write to forward metrics from each Prometheus shard to a long-term storage backend. This decouples local retention (short, fast) from long-term retention (cheap, deep).

1remote_write: 2 - url: "https://thanos-receive.example.com/api/v1/receive" 3 queue_config: 4 capacity: 100000 5 max_samples_per_send: 10000 6 batch_send_deadline: 5s 7 min_shards: 1 8 max_shards: 10

Tune queue parameters based on your ingestion rate. Monitor prometheus_remote_storage_queue_highest_sent_timestamp_seconds to detect backpressure .

Footnotes

Optimizing Prometheus Remote Write Performance - Last9 guide on queue tuning, cardinality management, and relabeling ↩
4
Step 4
Use the /federate endpoint to pull pre-aggregated metrics from leaf Prometheus instances into a global view. This is pull-based and selective — only federate recording rule outputs, not raw high-cardinality metrics.

1scrape_configs: 2 - job_name: 'federate' 3 scrape_interval: 30s 4 honor_labels: true 5 metrics_path: '/federate' 6 params: 7 'match[]': 8 - '{job=~".+"}' 9 - '{__name__=~"job:.+"}' 10 static_configs: 11 - targets: 12 - 'prometheus-leaf-1:9090' 13 - 'prometheus-leaf-2:9090'

Federation is best suited for hierarchical aggregation, not as a replacement for remote write .

Footnotes

Prometheus Federation Explained: Architecture & Pitfalls - Groundcover deep dive on federation architecture and challenges ↩
5
Step 5
Configure retention based on environment needs. Production typically uses 30–60 days locally, with long-term storage handling anything beyond that.

Development: 3–7 days (~10GB)

Staging: ~14 days (~25GB)

Production: 30–60 days (100GB+)

Compliance: 1+ years (external storage required)

Use the --storage.tsdb.retention.size and --storage.tsdb.retention.time flags. Reducing retention requires a restart and cannot be undone — data older than the new period is immediately purged .

Footnotes

How to Configure and Optimize Prometheus Data Retention - Retention settings per environment with storage guidance ↩

Cardinality: The Silent Killer

Cardinality is the single most impactful factor on Prometheus performance in production. Each unique combination of label values creates a separate time series, requiring its own chunk in memory and on disk.

Consider a metric http_requests_total with labels instance (3 values), method (5 values), and endpoint (1,000 values):

$\text{Cardinality} = 3 \times 5 \times 1000 = 15{,}000 \text{ series}$

That's 15,000 separate chunks in the TSDB index, each needing memory for in-memory head chunks, index entries in postings and symbols, and disk I/O for compaction and querying. At a scrape interval of 15s, this generates approximately 4 million samples per hour across all series. A real-world optimization at one company reduced their active series from 10M to 877K — a 92% memory reduction (from ~60GB to <5GB) — simply by removing unnecessary labels and metrics 2.

Detecting High Cardinality

Use Prometheus's built-in introspection tools:

Metric / Tool	Purpose
`prometheus_tsdb_head_series`	Current number of active series in the head block
`prometheus_tsdb_head_chunks_created_total`	Rate of new chunk creation
`top(10, count by (\_\_name\_\_)({...}))`	Top 10 metric names by series count
`top(10, count by (job)({...}))`	Top 10 jobs by series count
`label_join` / `label_replace`	Identify which labels contribute most to cardinality

Understanding and Optimizing Resource Consumption in Prometheus - Palark case study: 92% memory reduction through cardinality optimization ↩
How to Manage High Cardinality Metrics in Prometheus and Kubernetes - Grafana Labs guide on cardinality management strategies ↩

Prometheus Memory Usage by Cardinality Level

Approximate memory consumption at different active series counts

# Drop high-cardinality endpoint label at scrape time
scrape_configs:
  - job_name: 'my-app'
    metrics_path: /metrics
    static_configs:
      - targets: ['app:8080']
    metric_relabel_configs:
      # Drop the endpoint label to reduce cardinality
      - source_labels: [endpoint]
        regex: '/api/v[0-9]+/users/\d+'
        action: drop
      # Drop unused metrics entirely
      - source_labels: [__name__]
        regex: 'go_gc_duration_seconds.*'
        action: drop

Cardinality Explosion Warning

Never add labels with unbounded values (user IDs, request IDs, IP addresses) to Prometheus metrics. A single metric with a user_id label and 100,000 users creates 100,000 time series. Instead, aggregate at the application level and expose pre-computed metrics, or use exemplars to attach trace IDs without increasing series count.

Long-Term Storage Solutions: Deep Comparison

Recording Rules: Pre-Computing for Performance

Recording rules are the most underutilized tool in the Prometheus toolbox at scale. They allow you to:

Reduce query latency: Dashboards query a pre-computed recording rule instead of running expensive PromQL at render time
Reduce cardinality: Aggregate away high-cardinality dimensions (e.g., drop endpoint while keeping job and method)
Compose complex expressions: Break multi-stage calculations into named, debuggable intermediates

Naming convention is critical for maintainability. The Prometheus community recommends the pattern:

$\text{level:metric:operations}$

For example: job:http_requests:rate5m, service:availability:ratio_rate5m, cluster:node_cpu:avg_5m

This convention encodes the aggregation level, the source metric, and the operation applied — making it immediately clear what a recording rule produces and how it can be consumed by dashboards and alerts .

Rule Organization Best Practices

Structure your rule files logically to support team ownership and independent deployment:

/etc/prometheus/rules/
├── recording_rules/
│   ├── infrastructure.yml
│   ├── application.yml
│   └── sli_slo.yml
└── alerting_rules/
    ├── critical_alerts.yml
    └── warning_alerts.yml

Always validate rules before deploying using promtool check rules <file>. In Kubernetes environments, use the PrometheusRule custom resource provided by the Prometheus Operator for declarative rule management .

Prometheus Recording Rules Documentation - Official Prometheus docs on recording and alerting rule syntax ↩ ↩²

Prometheus Production Maturity Lifecycle

Single Instance

Stage 1

Deploy a single Prometheus server with default config. Focus on scrape coverage, basic alerting, and Grafana dashboards. Suitable for <100K active series."

Vertical Scaling & Optimization

Stage 2

Increase resources, tune scrape intervals, add recording rules, manage cardinality. Introduce metric_relabel_configs to drop unneeded metrics. Suitable for 100K–1M active series."

Functional Sharding

Stage 3

Split into multiple Prometheus instances by domain (infra, app, business). Add remote_write for long-term storage. Suitable for 1M–5M active series."

HA Replicas & Deduplication

Stage 4

Deploy replica Prometheus instances for high availability. Configure external_labels for replica identification. Set up deduplication at the LTS layer (Thanos/Mimir)."

Distributed Observability Stack

Stage 5

Full Thanos/Mimir/Cortex deployment with query frontends, caching, multi-tenancy. Federation for global views. 5M–50M+ active series. Dedicated SRE effort for observability infrastructure."

Remote Write Queue Tuning

Monitor prometheus_remote_storage_queue_highest_sent_timestamp_seconds — if it falls behind time(), your queues are backing up. Increase max_shards or max_samples_per_send incrementally. Never set max_shards too high without monitoring network utilization, as this can overwhelm the receiver and cause cascading failures .

Optimizing Prometheus Remote Write Performance - Last9 guide on queue tuning, cardinality management, and relabeling ↩

High Availability: Making Prometheus Resilient

Prometheus has no built-in clustering — HA is achieved by running redundant, independent instances. The key design principle is: each replica scrapes the same targets independently and evaluates the same rules independently. This means:

No shared state: Each replica maintains its own TSDB
No split-brain risk: There's no consensus protocol to worry about
Duplicate data is expected: Deduplication happens at the query layer (Thanos/Mimir/Cortex)

External labels are essential for HA deduplication. Configure unique replica labels:

1global:
2  external_labels:
3    cluster: 'us-east-1'
4    replica: 'prom-1'

The LTS backend uses the replica label to identify and deduplicate data from both instances .

Prometheus High Availability (HA) - New Relic documentation on HA configuration with external labels ↩

Monitoring Prometheus Itself

A production Prometheus that isn't monitored is a ticking time bomb. Key self-monitoring metrics:

Metric	What It Tells You	Action Threshold
`prometheus_tsdb_head_series`	Active series count	> 80% of historical peak
`process_resident_memory_bytes`	Memory consumption	> 80% of available RAM
`prometheus_remote_storage_queue_highest_sent_timestamp_seconds`	Remote write lag	> 60s behind `time()`
`prometheus_target_sync_length_seconds_sum`	SD processing overhead	Increasing trend
`prometheus_tsdb_compactions_total`	Compaction workload	Sustained high rate
`prometheus_rule_evaluation_duration_seconds`	Rule evaluation time	p99 > 1s
`prometheus_sd_refresh_duration_seconds`	Service discovery latency	Increasing trend

Set alerts on:

Memory pressure: process_resident_memory_bytes > 0.8 * available_memory
Remote write backlog: time() - prometheus_remote_storage_queue_highest_sent_timestamp_seconds > 300
Scrape failures: up == 0 for critical targets for > 5 minutes
Rule evaluation delays: prometheus_rule_evaluation_duration_seconds{quantile="0.99"} > 10

Prometheus in Production — Key Concepts

1 / 5

20%

Question · Term

What is the primary mechanism for scaling Prometheus beyond a single instance?

Click to reveal

Answer · Definition

Functional sharding — splitting metrics by domain (infra, app, business) across separate Prometheus instances, combined with remote_write for long-term storage and federation for cross-cluster views.

Long-Term Storage Solutions Comparison

Evaluated across 6 dimensions for production deployment

Knowledge Check

Question 1 of 5

Q1Single choice

You notice that a single Prometheus instance is consuming 50GB of memory with 10 million active series. What is the most effective first step to reduce resource consumption?

Add more CPU cores to the Prometheus server

Identify and drop high-cardinality labels using metric_relabel_configs

Increase the scrape interval from 15s to 60s for all targets

Add a second Prometheus replica for load balancing

Explore Related Topics

Master Class: Kubernetes Fundamentals

Kubernetes is the industry‑standard platform for orchestrating containerized microservices, separating cluster management (Control Plane) from workload execution (Worker Nodes) and emphasizing declarative, version‑controlled deployments.

The Control Plane (kube‑apiserver, etcd, scheduler, controller‑manager) stores the cluster’s desired state and makes global scheduling decisions.
Worker nodes run kubelet, kube‑proxy, and a container runtime to host Pods and enforce networking rules.
Core Kubernetes objects—Pods, Services, and Deployments—enable self‑healing, stable networking, and scalable rollouts.
Declarative YAML manifests (kubectl apply) support IaC and GitOps, while imperative commands are discouraged.
Production workloads should use higher‑level abstractions (Deployments/StatefulSets) instead of bare Pods to ensure resilience.

Java Roadmap 2026: From Core Language to Production-Ready Professional

2026 Java roadmap outlines language, frameworks, concurrency, AI, and AOT skills for production‑ready developers.

Java 25 LTS is the current baseline; Oracle now follows a 2‑year LTS cycle (next LTS Java 29 in 2027).
Virtual threads and Structured Concurrency (Project Loom) simplify high‑scale I/O, reducing the need for reactive libraries.
Spring Boot 4/Spring 7 with Spring AI and LangChain4j make LLM integration essential.
Choose GraalVM Native Image for native binaries or Project Leyden AOT caching for 40‑60 % faster JVM startup, based on compatibility vs. startup speed.

Browse all research articles

Prometheus in Production

Footnotes

Scaling Stages: Where Are You?

Footnotes

Setting Up a Production-Grade Prometheus Architecture

Footnotes

Footnotes

Footnotes

Cardinality: The Silent Killer

Detecting High Cardinality

Footnotes

Prometheus Memory Usage by Cardinality Level

Cardinality Explosion Warning

Long-Term Storage Solutions: Deep Comparison

Recording Rules: Pre-Computing for Performance

Rule Organization Best Practices

Footnotes

Prometheus Production Maturity Lifecycle

Single Instance

Vertical Scaling & Optimization

Functional Sharding

HA Replicas & Deduplication

Distributed Observability Stack

Remote Write Queue Tuning

Footnotes

High Availability: Making Prometheus Resilient

Footnotes

Monitoring Prometheus Itself

Prometheus in Production — Key Concepts

What is the primary mechanism for scaling Prometheus beyond a single instance?

Long-Term Storage Solutions Comparison

Knowledge Check

Explore Related Topics