How to Monitor Your API Without False Alarms
You set up monitoring to catch problems. But now you're getting alerts at 3am for issues that resolve themselves in 30 seconds. Here's how to fix it.
The Problem with "Ping-Based" Monitoring
Most uptime monitoring works like this:
- Send a request to your API every 5 minutes
- If the response isn't 200 OK, send an alert
- Repeat forever
Simple, right? The problem is that transient failures are normal.
- Your hosting provider has a 2-second network blip
- A DNS lookup times out once
- Your app is restarting after a deployment
- Cloudflare has a momentary edge issue
None of these require waking someone up. But with basic ping monitoring, every single one triggers an alert.
The Three Rules of No-Noise Monitoring
Rule 1: Consecutive Failures Before Alert
Never alert on a single failed check. Require 2-3 consecutive failures before sending a notification.
Here's why this matters:
Check #1: FAILED (timeout)
Check #2: SUCCESS (200 OK) ← Issue resolved in < 5 minutes
Check #3: SUCCESS (200 OK)
With single-failure alerting, you'd get an alert. With 2-consecutive-failure alerting, you get nothing — because the issue self-resolved.
Impact: This single change eliminates 90%+ of false positives.
Rule 2: Alert Deduplication
If your API is down, you don't need 12 alerts telling you it's still down.
Without deduplication, a 30-minute outage generates:
- 6 checks (every 5 minutes)
- 6 alerts (one per failure)
- 6 Slack notifications
- 6 SMS messages
With deduplication, you get one alert: "API is down" at 3:00am, and one recovery notification: "API is back up" at 3:30am.
Impact: Reduces alert volume by 80%+ without losing any signal.
Rule 3: Severity-Based Routing
Not all failures are created equal. Route alerts based on severity:
| Severity | Example | Notification |
|---|---|---|
| Critical | Payment API down | SMS + Push + Slack (immediate) |
| Warning | Response time > 2s | Email digest (morning) |
| Info | Single check failed | Dashboard only |
Impact: Only wake people up for problems that actually require immediate action.
Implementation: What to Configure
When setting up monitoring for your API, configure these settings:
- Check interval: 5 minutes (don't go shorter unless you need it)
- Failure threshold: 2-3 consecutive failures
- Timeout: 10-30 seconds (give slow responses time to complete)
- Expected status: 200 OK (or 201, 204, depending on endpoint)
- Expected response: Optionally validate JSON structure or specific fields
The OpsPulse Approach
We built OpsPulse around these principles because we experienced alert fatigue ourselves. Here's how it works by default:
- 2 consecutive failures before alerting
- Automatic deduplication (same incident = one alert)
- Telegram alerts (no email spam, instant delivery)
- 5-minute checks with 30-second timeout
The result: indie developers go from 47 alerts per week to 3.
Check Your Current Setup
Ask yourself these questions about your current monitoring:
- Do you get alerts for issues that resolve in < 5 minutes?
- Do you get multiple alerts for the same incident?
- Do you ignore alerts because "it's probably nothing"?
- Have you ever missed a real outage because you were desensitized to alerts?
If you answered "yes" to any of these, your monitoring configuration is working against you.
