Observability: start with three signals

"Observability" sounds like a budget line item — a vendor, a dashboard sprawl, a six-month rollout. It doesn't have to start there. For a single backend service, you can get 80% of the value from three signals, added in order of payoff.

1. Structured logs (the "why")

Plain-text logs are searchable by a human; structured logs are queryable by a machine. Emit JSON with a few consistent fields and your logs become a database:

{
  "level": "error",
  "msg": "payment failed",
  "trace_id": "abc123",
  "account_id": 42,
  "duration_ms": 812
}

The single most valuable field is a correlation id (here trace_id) attached to every log line in a request. When something breaks, you filter by that id and see the whole story instead of guessing.

2. The four golden signals (the "is it down / slow")

From Google's SRE practice, four metrics tell you almost everything about service health:

Latency — how long requests take (track p50 and p99, never just the average).
Traffic — how much demand you're getting.
Errors — the rate of failed requests.
Saturation — how full your resources are (CPU, memory, queue depth).

Averages lie. A p99 latency of 4s with a p50 of 80ms means 1% of users are having a terrible time while your average looks fine — and that 1% is often your biggest customers.

3. Traces (the "where")

Once you have more than two services, a request crosses boundaries and "it's slow" needs a where. Distributed tracing propagates that correlation id across service calls so you can see the request as a waterfall and spot which hop ate the time.

The order matters

Add logs first — they pay off the moment you have one bug. Add the golden signals next so you find out before your users do. Add tracing when you have enough services that "which one?" is a real question.

Observability isn't about collecting everything. It's about being able to ask a question you didn't anticipate, and getting an answer before the incident review.