Latency Monitoring: Why Average Response Time Lies

Published: March 20, 2026 • Reading time: 10 minutes

"Our average response time is 200ms." Sounds great, right? But what if 5% of your users are experiencing 10-second delays? The average would still look fine.

Here's why average response time is a dangerous metric and what you should track instead.

The Problem with Averages

Example: The 200ms Average That Hides a Problem

Consider these 100 requests:

95 requests: 100ms
5 requests: 2,300ms (2.3 seconds)

Average: 200ms — looks acceptable!

Reality: 5% of users waited over 2 seconds for a response.

Averages hide outliers. In latency monitoring, the outliers are often what matter most — they represent real user pain and often signal deeper problems.

Percentiles: The Metrics That Actually Matter

Instead of averages, track percentiles:

Percentile	What It Means	Use Case
p50 (median)	50% of requests are faster than this	Typical user experience
p90	90% of requests are faster than this	Most users' experience
p95	95% of requests are faster than this	Edge cases starting to show
p99	99% of requests are faster than this	Worst-case for most users
p99.9	99.9% of requests are faster than this	True outliers

Rule of thumb: If you only track one latency metric, make it p99. It catches the worst experiences your users are having while ignoring extreme outliers that might be noise.

Why p99 Matters More Than p50

Consider two scenarios:

Scenario A: Stable System

p50: 100ms
p99: 150ms

Everyone has a good experience. The worst 1% are only 50% slower than the median.

Scenario B: Problem Brewing

p50: 100ms (same!)
p99: 3,000ms (20x worse)

The median looks fine, but 1% of users are experiencing 3-second delays. This could be a database issue, a slow dependency, or memory pressure — problems that will likely spread if not addressed.

Key insight: A rising p99 often precedes a full outage. Catch it early and you prevent the cascade.

Latency Targets by Application Type

What's "good" latency depends on what you're building:

Application Type	p50 Target	p99 Target
Static website / CDN content	<50ms	<100ms
API endpoint (simple)	<100ms	<300ms
API endpoint (complex)	<200ms	<500ms
Database query (simple)	<10ms	<50ms
Database query (complex)	<100ms	<500ms
Full page load (backend)	<300ms	<1s

These are guidelines. The right targets depend on your users, your competition, and your SLAs.

Setting Up Latency Alerts

1. Use Percentile-Based Alerts

Alert on p99, not average:

# Bad: Alert on average
if avg_response_time > 500ms: alert()

# Good: Alert on p99
if p99_response_time > 1000ms: alert()

2. Set Multiple Thresholds

Severity	p99 Threshold	Response
Warning	>500ms	Investigate during business hours
Critical	>1s	Investigate immediately
Emergency	>3s	Page on-call, consider rollback

3. Track Latency by Endpoint

Overall latency metrics hide endpoint-specific problems. Track latency separately for:

Critical paths (checkout, login, API key endpoints)
Heavy operations (search, reports, exports)
Third-party integrations

4. Include Context in Alerts

When latency alerts fire, include:

Which endpoint is slow
Current p50, p90, p99 values
How it compares to baseline
Recent deployments or changes
Correlated metrics (CPU, memory, database connections)

Common Latency Patterns and What They Mean

Pattern 1: Gradual p99 Increase

Symptoms: p99 slowly rising over days or weeks

Likely causes: Database growth, memory leak, accumulating data

Action: Check query performance, index usage, memory trends

Pattern 2: Sudden p99 Spike

Symptoms: p99 jumps suddenly, often after deployment

Likely causes: Bad deployment, infrastructure change, dependency issue

Action: Check recent changes, consider rollback

Pattern 3: p99 and p50 Diverging

Symptoms: p50 stable, p99 climbing

Likely causes: Intermittent slow operations, tail latency from dependencies

Action: Investigate slow traces, check downstream services

Pattern 4: Periodic Latency Spikes

Symptoms: Regular latency increases at certain times

Likely causes: Scheduled jobs, backup processes, traffic patterns

Action: Reschedule heavy operations, add capacity during peak times

Latency Monitoring Best Practices

1. Measure at the Right Layer

Application level: Total request time including business logic
Database level: Query execution time
External dependencies: Third-party API call times
Network level: Time spent in transit

2. Use Histograms, Not Just Summaries

Summary metrics (p50, p99) are great for dashboards, but histograms let you explore the full distribution. You can always derive percentiles from histograms, but not vice versa.

3. Track Historical Trends

Latency degrades gradually. Compare current percentiles to last week, last month to spot slow creep.

4. Correlate with Other Metrics

High latency often correlates with:

High CPU usage
Memory pressure
Database connection pool exhaustion
Disk I/O bottlenecks

5. Set SLOs for Latency

Define service level objectives like "p99 < 500ms for 99.9% of 30-day windows". This makes latency a measurable commitment, not just a metric.

Common Latency Monitoring Mistakes

Mistake 1: Only Tracking Average

Averages hide the worst user experiences. Always track at least p50 and p99.

Mistake 2: Alerting Too Aggressively on p99

p99 naturally fluctuates. Set thresholds that catch real problems, not normal variance.

Mistake 3: Ignoring p99.9

While p99.9 can be noisy, tracking it helps identify true outliers and potential edge cases.

Mistake 4: Not Breaking Down by Endpoint

Overall latency metrics are useless for debugging. Know which endpoints are slow.

Mistake 5: Not Tracking Latency Over Time

A snapshot tells you nothing about trends. Keep historical data to spot gradual degradation.

Latency Monitoring Checklist

☐ Track p50, p90, p99 at minimum
☐ Alert on p99, not average
☐ Break down latency by endpoint
☐ Set multiple alert thresholds (warning, critical, emergency)
☐ Include context in alerts
☐ Track latency trends over time
☐ Correlate latency with resource metrics
☐ Define SLOs for critical endpoints
☐ Use histograms for detailed analysis
☐ Review latency patterns weekly

Monitor Your API Latency with OpsPulse

Track response times for your endpoints and get alerted when latency degrades. Catch slow responses before users complain.

Start Free Monitoring →

Summary

Latency monitoring done right:

Use percentiles — p99 matters more than average
Set multiple thresholds — Warning, critical, emergency
Break down by endpoint — Know what's slow
Track trends — Catch gradual degradation
Correlate with resources — Find root causes faster

The goal isn't zero latency — it's predictable, acceptable latency for all users.

The Problem with Averages

Example: The 200ms Average That Hides a Problem

Percentiles: The Metrics That Actually Matter

Why p99 Matters More Than p50

Scenario A: Stable System

Scenario B: Problem Brewing

Latency Targets by Application Type

Setting Up Latency Alerts

1. Use Percentile-Based Alerts

2. Set Multiple Thresholds

3. Track Latency by Endpoint

4. Include Context in Alerts

Common Latency Patterns and What They Mean

Pattern 1: Gradual p99 Increase

Pattern 2: Sudden p99 Spike

Pattern 3: p99 and p50 Diverging

Pattern 4: Periodic Latency Spikes

Latency Monitoring Best Practices

1. Measure at the Right Layer

2. Use Histograms, Not Just Summaries

3. Track Historical Trends

4. Correlate with Other Metrics

5. Set SLOs for Latency

Common Latency Monitoring Mistakes

Mistake 1: Only Tracking Average

Mistake 2: Alerting Too Aggressively on p99

Mistake 3: Ignoring p99.9

Mistake 4: Not Breaking Down by Endpoint

Mistake 5: Not Tracking Latency Over Time

Latency Monitoring Checklist

Monitor Your API Latency with OpsPulse

Summary

Related Resources