Latency Monitoring: Why Average Response Time Lies

The problem with averages, why p99 matters more, and how to set up latency alerts that actually catch problems

Published: March 20, 2026 • Reading time: 10 minutes

"Our average response time is 200ms." Sounds great, right? But what if 5% of your users are experiencing 10-second delays? The average would still look fine.

Here's why average response time is a dangerous metric and what you should track instead.

The Problem with Averages

Example: The 200ms Average That Hides a Problem

Consider these 100 requests:

Average: 200ms — looks acceptable!

Reality: 5% of users waited over 2 seconds for a response.

Averages hide outliers. In latency monitoring, the outliers are often what matter most — they represent real user pain and often signal deeper problems.

Percentiles: The Metrics That Actually Matter

Instead of averages, track percentiles:

Percentile What It Means Use Case
p50 (median) 50% of requests are faster than this Typical user experience
p90 90% of requests are faster than this Most users' experience
p95 95% of requests are faster than this Edge cases starting to show
p99 99% of requests are faster than this Worst-case for most users
p99.9 99.9% of requests are faster than this True outliers
Rule of thumb: If you only track one latency metric, make it p99. It catches the worst experiences your users are having while ignoring extreme outliers that might be noise.

Why p99 Matters More Than p50

Consider two scenarios:

Scenario A: Stable System

Everyone has a good experience. The worst 1% are only 50% slower than the median.

Scenario B: Problem Brewing

The median looks fine, but 1% of users are experiencing 3-second delays. This could be a database issue, a slow dependency, or memory pressure — problems that will likely spread if not addressed.

Key insight: A rising p99 often precedes a full outage. Catch it early and you prevent the cascade.

Latency Targets by Application Type

What's "good" latency depends on what you're building:

Application Type p50 Target p99 Target
Static website / CDN content <50ms <100ms
API endpoint (simple) <100ms <300ms
API endpoint (complex) <200ms <500ms
Database query (simple) <10ms <50ms
Database query (complex) <100ms <500ms
Full page load (backend) <300ms <1s

These are guidelines. The right targets depend on your users, your competition, and your SLAs.

Setting Up Latency Alerts

1. Use Percentile-Based Alerts

Alert on p99, not average:

# Bad: Alert on average
if avg_response_time > 500ms: alert()

# Good: Alert on p99
if p99_response_time > 1000ms: alert()

2. Set Multiple Thresholds

Severity p99 Threshold Response
Warning >500ms Investigate during business hours
Critical >1s Investigate immediately
Emergency >3s Page on-call, consider rollback

3. Track Latency by Endpoint

Overall latency metrics hide endpoint-specific problems. Track latency separately for:

4. Include Context in Alerts

When latency alerts fire, include:

Common Latency Patterns and What They Mean

Pattern 1: Gradual p99 Increase

Symptoms: p99 slowly rising over days or weeks

Likely causes: Database growth, memory leak, accumulating data

Action: Check query performance, index usage, memory trends

Pattern 2: Sudden p99 Spike

Symptoms: p99 jumps suddenly, often after deployment

Likely causes: Bad deployment, infrastructure change, dependency issue

Action: Check recent changes, consider rollback

Pattern 3: p99 and p50 Diverging

Symptoms: p50 stable, p99 climbing

Likely causes: Intermittent slow operations, tail latency from dependencies

Action: Investigate slow traces, check downstream services

Pattern 4: Periodic Latency Spikes

Symptoms: Regular latency increases at certain times

Likely causes: Scheduled jobs, backup processes, traffic patterns

Action: Reschedule heavy operations, add capacity during peak times

Latency Monitoring Best Practices

1. Measure at the Right Layer

2. Use Histograms, Not Just Summaries

Summary metrics (p50, p99) are great for dashboards, but histograms let you explore the full distribution. You can always derive percentiles from histograms, but not vice versa.

3. Track Historical Trends

Latency degrades gradually. Compare current percentiles to last week, last month to spot slow creep.

4. Correlate with Other Metrics

High latency often correlates with:

5. Set SLOs for Latency

Define service level objectives like "p99 < 500ms for 99.9% of 30-day windows". This makes latency a measurable commitment, not just a metric.

Common Latency Monitoring Mistakes

Mistake 1: Only Tracking Average

Averages hide the worst user experiences. Always track at least p50 and p99.

Mistake 2: Alerting Too Aggressively on p99

p99 naturally fluctuates. Set thresholds that catch real problems, not normal variance.

Mistake 3: Ignoring p99.9

While p99.9 can be noisy, tracking it helps identify true outliers and potential edge cases.

Mistake 4: Not Breaking Down by Endpoint

Overall latency metrics are useless for debugging. Know which endpoints are slow.

Mistake 5: Not Tracking Latency Over Time

A snapshot tells you nothing about trends. Keep historical data to spot gradual degradation.

Latency Monitoring Checklist

Monitor Your API Latency with OpsPulse

Track response times for your endpoints and get alerted when latency degrades. Catch slow responses before users complain.

Start Free Monitoring →

Summary

Latency monitoring done right:

  1. Use percentiles — p99 matters more than average
  2. Set multiple thresholds — Warning, critical, emergency
  3. Break down by endpoint — Know what's slow
  4. Track trends — Catch gradual degradation
  5. Correlate with resources — Find root causes faster

The goal isn't zero latency — it's predictable, acceptable latency for all users.

Related Resources