"depends_on" is a Lie: How Docker Healthchecks Actually Sequence Your Services

depends_on:
  - database

That’s in half the docker-compose.yml files out there. It looks correct. It even feels correct. It doesn’t do what you think it does.

The attribute depends_on controls startup order, not service readiness. Docker will wait for the database container to start, not for PostgreSQL to finish initializing, not for MySQL to complete InnoDB recovery, not for Redis to load its persistence file. It waits for the container process to spawn, then immediately starts your application. What happens next is a race condition baked into your infrastructure.

The race you’re losing

When your app boots and tries to connect to a database that’s still initializing, you get a connection refused error. Most applications handle this with retry logic. But that’s not a solution, that’s a symptom being masked. You’re relying on application code to compensate for an infrastructure sequencing problem.

The actual fix has been in Docker Compose for years. Most setups just don’t use it.

What container readiness actually means

A healthcheck defines what “ready” means for a specific service. Docker runs it periodically and exposes the result as a container status: starting, healthy, or unhealthy. Once defined, depends_on can use it:

services:
  database:
    image: postgres:16
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5
      start_period: 10s

  app:
    image: my-app:latest
    depends_on:
      database:
        condition: service_healthy

condition: service_healthy tells Docker: don’t start app until database reports healthy. Not until the container is running, until the process inside is actually accepting connections. That’s the difference.

The four fields that matter

test is the command Docker runs inside the container to determine readiness. For databases, use the official readiness check. For HTTP services, a simple probe:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8080/health"]

interval is how often to run the check. The default is 30 seconds, which is too slow for startup sequencing. 5 seconds is reasonable for most services.

start_period is the grace period before Docker starts counting failures. A container that takes 15 seconds to initialize will fail its healthcheck several times before it’s ready. Without start_period, those early failures count toward retries and the container gets marked unhealthy before it’s had a chance to boot. Set this to a realistic upper bound for your service’s initialization time.

retries is how many consecutive failures before the container is marked unhealthy. Five is a solid default.

Propagating readiness through a full stack

The pattern composes naturally. A reverse proxy that waits for a healthy application, which in turn waits for a healthy database:

services:
  database:
    image: postgres:16
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5
      start_period: 10s

  app:
    image: my-app:latest
    depends_on:
      database:
        condition: service_healthy

  proxy:
    image: traefik:v3.0
    depends_on:
      app:
        condition: service_healthy

This stack will always come up in the right order, regardless of container start times, host load, or Docker daemon scheduling. No retry logic in the app to handle database unavailability at boot. No startup script sleeping and polling in a loop. The sequencing is declared in the infrastructure layer, where it belongs.

One thing to watch

Healthchecks run as commands inside the container, which means the check tool has to exist in the image. pg_isready ships with the PostgreSQL client, so it’s always present in the official Postgres image. But if you’re probing an HTTP endpoint with curl, your minimal production image might not have it. The cleanest alternative is wget, available in Alpine-based images:

test: ["CMD-SHELL", "wget -qO- http://localhost:8080/health || exit 1"]

Or better yet, expose a dedicated /health endpoint in your application and check it with whatever binary your image ships with. A startup race condition is not a reliability problem you solve in application code. It’s an infrastructure problem, and it has an infrastructure solution.