Monitoring Microservices for Small Teams: Start Simple

Published: March 20, 2026 • Reading time: 10 minutes

Microservices monitoring advice usually assumes enterprise scale: service meshes, distributed tracing, sophisticated observability platforms. But most teams running microservices are small teams with a handful of services.

Here's how to monitor microservices when you don't have a dedicated ops team.

The Problem with Microservices Monitoring Advice

Typical advice for microservices monitoring includes:

Implement distributed tracing (Jaeger, Zipkin)
Deploy a service mesh (Istio, Linkerd)
Use centralized logging (ELK, Loki)
Set up comprehensive metrics (Prometheus + Grafana)
Implement correlation IDs

For a team of 3 with 4 services: This is overwhelming. You'll spend more time on monitoring infrastructure than on your actual product.

What You Actually Need

For small teams with a handful of services, focus on:

Health checks — Is each service running?
Request tracking — How many requests? How fast? Any errors?
Error logging — When things fail, what went wrong?
External monitoring — Is the whole system reachable?

Level 1: Basic Health Checks

Every service should have a health endpoint:

GET /health
{
  "status": "healthy",
  "service": "user-api",
  "version": "1.2.3",
  "uptime_seconds": 86400,
  "dependencies": {
    "database": "ok",
    "cache": "ok"
  }
}

What to Check

Process is alive — Can respond to HTTP requests
Critical dependencies — Database, cache, message queue
Basic functionality — Not deeply broken

Keep It Fast

Health checks should return in <1 second. Don't run expensive queries or deep checks.

Level 2: Request Metrics

Track basic request metrics for each service:

Metric	Why It Matters
Request count	Is traffic normal?
Error count	Is something broken?
Response time (p95, p99)	Is performance degrading?
Active requests	Is the service overloaded?

Simple Implementation

// Middleware to track requests
app.use((req, res, next) => {
  const start = Date.now();

  res.on('finish', () => {
    const duration = Date.now() - start;
    const status = res.statusCode;

    metrics.increment('requests.total', { service: 'user-api', status: Math.floor(status/100) + 'xx' });
    metrics.histogram('request.duration_ms', duration, { service: 'user-api' });
  });

  next();
});

Don't overcomplicate: You don't need Prometheus yet. A simple statsd client + hosted metrics (like Datadog, Librato) is enough for small teams.

Level 3: Error Tracking

When errors happen, you need to know what failed and why:

What to Log

Error message — What went wrong?
Stack trace — Where in the code?
Request context — What request caused it?
Service name — Which service?
Timestamp — When?

Tools for Small Teams

Sentry — Error tracking with context
Bugsnag — Error monitoring and alerting
Honeybadger — Simple error tracking

Level 4: Request Correlation

When a request spans multiple services, you need to connect the dots:

// Generate or propagate request ID
app.use((req, res, next) => {
  const requestId = req.headers['x-request-id'] || generateId();
  req.requestId = requestId;
  res.setHeader('x-request-id', requestId);
  next();
});

// Pass to downstream services
const response = await fetch('http://orders-api/orders', {
  headers: { 'x-request-id': req.requestId }
});

Now you can search logs for a specific request ID and see its journey across services.

You don't need distributed tracing yet: Request IDs in logs are sufficient for small teams. Add Jaeger/Zipkin when you have 10+ services and complex debugging needs.

Level 5: External Monitoring

Internal metrics tell you if your services are running. External monitoring tells you if users can reach them.

What to Monitor Externally

API gateway / load balancer — Entry point to your services
Key endpoints — Login, critical API paths
Health endpoints — Each service's /health

Why External Monitoring Matters

DNS issues (your servers are fine, but nobody can find them)
Network problems (your region is isolated)
Load balancer failures (services healthy but unreachable)
SSL certificate expiration

Monitoring Architecture for Small Teams

What to Build

                    External Monitoring
                           ↓
                    [ Load Balancer ]
                       /     |     \
              [API-1]  [API-2]  [API-3]
                  \        |        /
                   \       |       /
                    [ Shared Database ]
                           ↑
                    [ Error Tracking ]

Minimum Viable Stack

Health checks: Built into each service
Metrics: Statsd → hosted service OR simple Prometheus
Error tracking: Sentry/Bugsnag
External monitoring: OpsPulse (uptime checks)
Logging: Structured logs to files → cloud logging service

Common Mistakes

Mistake 1: Over-Instrumenting

Problem: Tracking every possible metric, creating noise.

Fix: Start with request count, error count, response time. Add more when you have a specific need.

Mistake 2: Ignoring External Monitoring

Problem: All monitoring is internal. You don't know when external users can't reach you.

Fix: Add external uptime checks for your public endpoints.

Mistake 3: Complex Tooling Too Early

Problem: Deploying Istio, Jaeger, and full observability stack for 3 services.

Fix: Start simple. Add complexity when you have the team and the need.

Mistake 4: No Request Correlation

Problem: Can't trace a request across services when debugging.

Fix: Add request IDs early. It's simple and pays off immediately.

Microservices Monitoring Checklist

Each Service

☐ Health endpoint (/health)
☐ Request metrics (count, latency, errors)
☐ Request ID propagation
☐ Structured logging with service name
☐ Error tracking integration

System-Wide

☐ External monitoring for public endpoints
☐ Centralized log aggregation
☐ Metrics dashboard (even if simple)
☐ Alert routing (email, Slack, PagerDuty)
☐ Runbook for common issues

Monitor Your Microservices Externally

OpsPulse provides external uptime monitoring for your API gateway and individual services. Know when users can't reach you, not just when your services are running.

Start Free Monitoring →

When to Add More Complexity

Add distributed tracing when:

You have 10+ services with complex interactions
You frequently need to debug cross-service issues
You have someone who can maintain the tracing infrastructure

Add a service mesh when:

You need mutual TLS between services
You want automatic retries and circuit breaking
You're running Kubernetes and can manage the complexity

Add comprehensive metrics when:

You need to optimize performance
You're debugging capacity issues
You have SLOs to meet

Summary

For small teams with microservices:

Health checks first — Every service should report its status
Basic metrics — Request count, errors, latency
Error tracking — Know when and why things fail
Request IDs — Connect requests across services
External monitoring — Verify users can reach you

You can always add complexity later. Start with what gives you visibility today.

The Problem with Microservices Monitoring Advice

What You Actually Need

Level 1: Basic Health Checks

What to Check

Keep It Fast

Level 2: Request Metrics

Simple Implementation

Level 3: Error Tracking

What to Log

Tools for Small Teams

Level 4: Request Correlation

Level 5: External Monitoring

What to Monitor Externally

Why External Monitoring Matters

Monitoring Architecture for Small Teams

What to Build

Minimum Viable Stack

Common Mistakes

Mistake 1: Over-Instrumenting

Mistake 2: Ignoring External Monitoring

Mistake 3: Complex Tooling Too Early

Mistake 4: No Request Correlation

Microservices Monitoring Checklist

Each Service

System-Wide

Monitor Your Microservices Externally

When to Add More Complexity

Summary

Related Resources