Out of memory (OOM) errors are one of the most common causes of application crashes. Unlike CPU spikes that slow things down, memory exhaustion kills your process instantly — often with little warning.
Here's how to monitor memory properly and catch issues before they become incidents.
Why Memory Monitoring Matters
Memory problems are sneaky. They often build slowly, then hit suddenly:
- Memory leaks grow over days or weeks until you hit limits
- Sudden traffic spikes can exhaust memory in minutes
- OOM kills happen instantly with no graceful degradation
- Swap thrashing destroys performance before the actual crash
Memory Metrics You Should Track
1. System-Level Metrics
| Metric | What It Means | Alert Threshold |
|---|---|---|
| Total memory used % | System-wide RAM utilization | >85% (critical: >95%) |
| Available memory | RAM free for new allocations | <100MB available |
| Swap usage | Memory spilled to disk | Any sustained swap usage |
| Swap in/out rate | How actively swapping | Sustained activity |
2. Process-Level Metrics
| Metric | What It Means | Alert Threshold |
|---|---|---|
| RSS (Resident Set Size) | Actual physical memory used | Approaching container limit |
| Heap used / heap max | Language runtime memory | >80% of heap |
| Memory growth rate | MB/hour increase | Sustained growth over hours |
| GC pause time | Time spent in garbage collection | >100ms pauses |
3. Container/Environment Metrics
| Metric | What It Means | Alert Threshold |
|---|---|---|
| Container memory limit | Hard limit for your process | Know this number |
| Memory usage % of limit | How close to OOM | >80% |
| OOM kill count | Times killed by OOM | Any OOM kills |
Detecting Memory Leaks
Memory leaks are the silent killers of long-running processes. Here's how to spot them:
Signs of a Memory Leak
- Steady upward trend in memory usage that never decreases
- Memory doesn't return to baseline after traffic spikes
- Increasing GC frequency or duration over time
- App gets slower over time (GC overhead)
How to Monitor for Leaks
# Track process memory over time (Linux)
watch -n 60 'ps -o pid,rss,command -p YOUR_PID'
# Log memory every minute for trend analysis
while true; do
echo "$(date): $(ps -o rss= -p YOUR_PID) KB"
sleep 60
done >> memory.log
Language-Specific Memory Monitoring
Node.js
// Get heap statistics
const used = process.memoryUsage();
console.log({
rss: `${Math.round(used.rss / 1024 / 1024)}MB`,
heapTotal: `${Math.round(used.heapTotal / 1024 / 1024)}MB`,
heapUsed: `${Math.round(used.heapUsed / 1024 / 1024)}MB`,
external: `${Math.round(used.external / 1024 / 1024)}MB`
});
// Enable heap snapshots for debugging
// node --expose-gc --heapsnapshot-signal=SIGUSR2 app.js
Python
import psutil
import os
process = psutil.Process(os.getpid())
memory_info = process.memory_info()
print(f"RSS: {memory_info.rss / 1024 / 1024:.1f}MB")
print(f"VMS: {memory_info.vms / 1024 / 1024:.1f}MB")
Go
import (
"runtime"
"fmt"
)
var m runtime.MemStats
runtime.ReadMemStats(&m)
fmt.Printf("Alloc = %v MiB\n", m.Alloc / 1024 / 1024)
fmt.Printf("TotalAlloc = %v MiB\n", m.TotalAlloc / 1024 / 1024)
fmt.Printf("Sys = %v MiB\n", m.Sys / 1024 / 1024)
fmt.Printf("NumGC = %v\n", m.NumGC)
Java (JVM)
# JVM memory info via jcmd
jcmd YOUR_PID VM.native_memory summary
# Or via JMX/Micrometer for apps
# Track: jvm.memory.used, jvm.memory.max, jvm.gc.pause
Setting Up Memory Alerts
Alert Hierarchy
Don't alert on every metric. Set up a hierarchy:
- Warning (investigate): Memory >80% for 5+ minutes
- Critical (act now): Memory >90% for 2+ minutes
- Emergency: OOM kill detected or swap thrashing
What to Include in Alerts
- Current memory usage (absolute and percentage)
- Rate of change (MB/minute)
- Time until projected OOM
- Top memory consumers (if available)
- Recent traffic or deployment events
Memory Monitoring Best Practices
1. Know Your Baseline
Every application has a "normal" memory footprint. Establish your baseline during normal operation and alert on deviations.
2. Monitor Trends, Not Just Snapshots
A single high-memory reading isn't necessarily a problem. Look at trends over time to distinguish between temporary spikes and real issues.
3. Set Memory Limits
Always set memory limits for your processes (container limits, ulimit, etc.). This ensures predictable behavior when limits are approached.
4. Test Memory Behavior Under Load
Load test your application and observe memory behavior. This reveals how your app handles traffic spikes and validates your alert thresholds.
5. Plan for Growth
Memory usage grows as your data and traffic grow. Review memory trends monthly and plan capacity upgrades before you hit limits.
Common Memory Monitoring Mistakes
Mistake 1: Only Alerting on Total System Memory
Your app might be fine while the system is swapping due to another process. Monitor your specific process, not just the system.
Mistake 2: Ignoring Swap Usage
Swap is the canary in the coal mine. If you're swapping, you're already in trouble — even if you haven't OOM'd yet.
Mistake 3: Not Tracking Memory Over Time
Without historical data, you can't detect leaks or plan capacity. Always store memory metrics for trend analysis.
Mistake 4: Alerting Too Late
Alerting at 95% gives you no time to react. Alert at 80-85% to give yourself time to investigate and fix.
Mistake 5: Not Testing OOM Scenarios
What happens when your app hits the memory limit? Test this in staging so you know what to expect in production.
Memory Monitoring Checklist
- ☐ Track process RSS and heap usage
- ☐ Monitor system-wide memory and swap
- ☐ Set alerts at 80% and 90% thresholds
- ☐ Log memory usage for trend analysis
- ☐ Know your container/process memory limits
- ☐ Monitor GC metrics (if applicable)
- ☐ Set up OOM kill detection
- ☐ Review memory trends weekly
- ☐ Load test memory behavior before launch
- ☐ Have a runbook for memory incidents
Monitor Your Application's Health with OpsPulse
Track uptime, response times, and get alerted when issues arise. Combine with your memory monitoring for complete coverage.
Start Free Monitoring →Summary
Memory monitoring essentials:
- Track process-level metrics — RSS, heap usage, growth rate
- Watch for leaks — Steady growth that doesn't stabilize
- Alert early — 80% gives you time to react
- Monitor swap — It's the early warning system
- Know your limits — And how close you are to them
With proper memory monitoring, you'll catch issues before they become outages.