Skip to main content
Version: 3.6.0

How to Configure Kubernetes Health Checks with Watt

Problem​

You're deploying Watt applications to Kubernetes and need robust health checking that:

  • Prevents traffic from reaching unhealthy pods
  • Automatically restarts failed containers
  • Handles complex health dependencies (databases, external services)
  • Provides proper startup time for initialization
  • Integrates with Kubernetes orchestration patterns

When to use this solution:

  • Production Kubernetes deployments
  • Applications with external dependencies that need health validation
  • Services requiring zero-downtime deployments
  • Complex multi-service applications where service health interdependencies matter

Solution Overview​

This guide shows you how to implement comprehensive Kubernetes health checks using Watt's built-in health endpoints. You'll learn to:

  1. Configure readiness and liveness probes properly
  2. Implement custom health checks for your application dependencies
  3. Set appropriate probe timing and thresholds
  4. Handle startup scenarios and graceful shutdowns

Understanding Kubernetes Health Probes​

Kubernetes uses probes to determine application health:

  • Readiness Probe: Determines if the pod is ready to receive traffic. Failed readiness removes the pod from service endpoints.
  • Liveness Probe: Determines if the container should be restarted. Failed liveness triggers container restart by Kubernetes.
  • Startup Probe: Provides extra time for slow-starting containers. Disables readiness and liveness probes until startup succeeds.

Prerequisites​

Before implementing Kubernetes health checks, you need:

  • Node.js 22.19.0+ installed on your development machine (or later)
  • Docker for containerization
  • Kubernetes cluster access (local or cloud)
  • kubectl configured to access your cluster

Installation​

1. Create a new Watt application, remeber to select a @platformatic/node called api:

$ npm create wattpm
Hello YOURNAME, welcome to Watt Utils 3.0.0!
? This folder seems to already contain a Node.js application. Do you want to wrap into Watt? no
? Where would you like to create your project? my-health-app
? Which kind of application do you want to create? @platformatic/node
✔ Installing @platformatic/node@^3.0.3 using pnpm ...
? What is the name of the application? api
? Do you want to create another application? no
? What port do you want to use? 3042
cd web/api; npm install fastify @fastify/postgresql @fastify/autoload; cd ..

Then replace the web/api/index.js file with:

import fastify from 'fastify'
import autoload from '@fastify/autoload'
import { join } from 'node:path'

export async function create () {
const app = fastify({
loggerIntance: globalThis.platformatic?.logger
})

// Register PostgreSQL plugin
await app.register(import('@fastify/postgresql'), {
connectionString: process.env.DATABASE_URL || 'postgres://postgres:password@postgres:5432/healthdb'
})

// Autoload routes
await app.register(autoload, {
dir: join(import.meta.dirname, 'routes')
})

app.get('/', async () => {
const client = await app.pg.connect()
try {
const result = await client.query('SELECT NOW() as current_time')
return { message: 'hello world', db_time: result.rows[0].current_time }
} finally {
client.release()
}
})

return app
}

This created a Fastify app that will autoload the routes.

Platformatic Health Check APIs​

Watt provides built-in health check endpoints through its metrics server. The metrics server exposes the following endpoints by default:

  • /ready (Readiness endpoint): Indicates if all services are started and ready to accept traffic
  • /status (Liveness endpoint): Indicates if all services are healthy and their custom health checks pass

Endpoint Customization​

You can customize the health check endpoints in your Watt configuration:

{
"metrics": {
"hostname": "0.0.0.0",
"port": 9090,
"readiness": {
"endpoint": "/health"
},
"liveness": {
"endpoint": "/live"
}
}
}

Service Discovery and Autoload​

By default, Watt automatically loads all services in the web folder via the autoload configuration. You don't need to manually specify each service in the configuration. Watt will:

  • Discover all valid Platformatic services in this directory
  • Automatically register them in the runtime
  • Include them in health check evaluations
  • Expose their metrics through the metrics server

This autoload behavior simplifies deployment and ensures all your services are automatically included in the health monitoring system.

Custom Health Check Functions​

  • setCustomHealthCheck: Sets a custom liveness check function that runs on the /status (or custom liveness) endpoint
  • setCustomReadinessCheck: Sets a custom readiness check function that runs on the /ready (or custom readiness) endpoint

Both methods accept a function that returns:

  • A boolean value (true = healthy, false = unhealthy)
  • An object with:
    • status: boolean indicating success/failure
    • statusCode: optional HTTP status code (defaults to 200/500)
    • body: optional response body

Implementation​

1. Service Implementation with Custom Health Checks​

Update your web/api/index.js to implements comprehensive health checks:

import fastify from 'fastify'
import autoload from '@fastify/autoload'
import { join } from 'node:path'

export async function create () {
const app = fastify({
loggerIntance: globalThis.platformatic?.logger
})

// Register PostgreSQL plugin
await app.register(import('@fastify/postgresql'), {
connectionString: process.env.DATABASE_URL || 'postgres://postgres:password@postgres:5432/healthdb'
})

// Autoload routes
await app.register(autoload, {
dir: join(import.meta.dirname, 'routes')
})

// Register custom liveness check (for /status endpoint)
globalThis.platformatic.setCustomHealthCheck(async () => {
try {
// Check PostgreSQL database connectivity
const client = await app.pg.connect()
try {
await client.query('SELECT 1')
} finally {
client.release()
}

return { status: true }
} catch (err) {
app.log.error({ err }, 'Health check failed')
return {
status: false,
statusCode: 503,
body: `Database health check failed: ${err.message}`
}
}
})

// Register custom readiness check (for /ready endpoint)
globalThis.platformatic.setCustomReadinessCheck(async () => {
try {
// Check if PostgreSQL connection pool is ready
if (!app.pg || !app.pg.pool) {
return false
}

// Quick connection test
const client = await app.pg.connect()
try {
await client.query('SELECT 1')
return true
} finally {
client.release()
}
} catch (err) {
app.log.error({ err }, 'Readiness check failed')
return false
}
})

// Add application routes
app.get('/', async () => {
const client = await app.pg.connect()
try {
const result = await client.query('SELECT NOW() as current_time')
return { message: 'hello world', db_time: result.rows[0].current_time }
} finally {
client.release()
}
})

return app
}

2. Watt Configuration​

Configure the metrics server in your watt.json file:

{
"metrics": {
"hostname": "0.0.0.0",
"port": 9090,
"readiness": {
"success": {
"statusCode": 200,
"body": "Ready"
},
"fail": {
"statusCode": 503,
"body": "Not Ready"
}
},
"liveness": {
"success": {
"statusCode": 200,
"body": "Healthy"
},
"fail": {
"statusCode": 503,
"body": "Unhealthy"
}
}
}
}

3. PostgreSQL Database Setup​

First, create a PostgreSQL deployment and service for your database:

postgres-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
labels:
app: postgres
spec:
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15
ports:
- containerPort: 5432
env:
- name: POSTGRES_DB
value: 'healthdb'
- name: POSTGRES_USER
value: 'postgres'
- name: POSTGRES_PASSWORD
value: 'password'
- name: PGDATA
value: '/var/lib/postgresql/data/pgdata'
volumeMounts:
- name: postgres-storage
mountPath: /var/lib/postgresql/data
readinessProbe:
exec:
command:
- pg_isready
- -U
- postgres
- -d
- healthdb
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
livenessProbe:
exec:
command:
- pg_isready
- -U
- postgres
- -d
- healthdb
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 5
volumes:
- name: postgres-storage
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: postgres
labels:
app: postgres
spec:
ports:
- port: 5432
targetPort: 5432
selector:
app: postgres

4. Kubernetes Application Configuration​

Create a Kubernetes deployment configuration that defines the probes:

apiVersion: apps/v1
kind: Deployment
metadata:
name: watt-health-app
labels:
app: watt-health-app
spec:
replicas: 2
selector:
matchLabels:
app: watt-health-app
template:
metadata:
labels:
app: watt-health-app
spec:
containers:
- name: watt-app
image: watt-health-app:latest
ports:
- containerPort: 3042
name: service
- containerPort: 9090
name: metrics
env:
- name: PLT_SERVER_HOSTNAME
value: '0.0.0.0'
- name: DATABASE_URL
value: 'postgres://postgres:password@postgres:5432/healthdb'
readinessProbe:
httpGet:
path: /ready
port: 9090
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
livenessProbe:
httpGet:
path: /status
port: 9090
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
startupProbe:
httpGet:
path: /ready
port: 9090
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 20 # Allow up to 100 seconds for startup
resources:
requests:
memory: '256Mi'
cpu: '250m'
limits:
memory: '512Mi'
cpu: '500m'

Key configuration points:

  • Startup Probe: Allows up to 100 seconds for application initialization
  • Readiness Probe: Checks /ready endpoint every 10 seconds after startup
  • Liveness Probe: Checks /status endpoint every 30 seconds after startup
  • Environment Variables: PLT_SERVER_HOSTNAME=0.0.0.0 ensures the app binds to all interfaces

Important Timing Considerations:

  • Startup probe runs first and disables other probes until successful
  • Readiness probe has lower failure threshold for faster traffic removal
  • Liveness probe has higher failure threshold to avoid unnecessary restarts
  • Timeout values account for potential network latency

5. Docker Configuration​

Create a Dockerfile for your Watt application:

FROM node:22-alpine

WORKDIR /app

# Copy package files
COPY package*.json ./
RUN npm ci --only=production

# Copy application code
COPY . .

# Expose ports
EXPOSE 3042 9090

# Set environment variables
ENV PLT_SERVER_HOSTNAME=0.0.0.0
ENV NODE_ENV=production

# Health check for Docker
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:9090/ready || exit 1

# Start the application
CMD ["npm", "start"]

Watt Internal Service Communication​

Watt provides a built-in service mesh that enables zero-configuration communication between services using the .plt.local domain. This is crucial for implementing proper health checks in multi-service applications.

Architecture Overview​

The following diagram illustrates how services communicate within a Watt application for health checks in Kubernetes:

graph TB
subgraph "Kubernetes Pod"
subgraph "Watt Runtime"
subgraph "Service Mesh"
Router[Internal Router]
Discovery["Service Discovery<br/>(.plt.local)"]
end

subgraph "Services"
Gateway["Gateway Service<br/>(Composer)<br/>:3001"]
API["API Service<br/>(Backend)<br/>:3002"]
Worker["Worker Service<br/>(Background)<br/>:3003"]
end

subgraph "Health Monitoring"
Metrics["Metrics Server<br/>:9090"]
Health["/ready, /status"]
end
end
end

subgraph "External"
K8s[Kubernetes Probes]
Client[External Clients]
end

%% Health check flows
K8s --> |"GET /ready<br/>GET /status"| Metrics
Metrics --> |"Check service health"| Gateway
Metrics --> |"Check service health"| API
Metrics --> |"Check service health"| Worker

%% Internal service communication
Gateway --> |"fetch('http://api.plt.local/health')"| Router
Gateway --> |"fetch('http://worker.plt.local/health')"| Router
Router --> API
Router --> Worker

%% External access
Client --> |"External requests"| Gateway

%% Service discovery
Discovery -.-> |"Resolves .plt.local"| Router

style Metrics fill:#e1f5fe
style Health fill:#e8f5e8
style Router fill:#fff3e0
style Discovery fill:#fff3e0

Key Communication Patterns:​

  1. Kubernetes Health Probes → Metrics server (:9090/ready, :9090/status)
  2. Metrics Server → Individual services for health verification
  3. Inter-Service Health Checks → Via .plt.local domain (e.g., http://api.plt.local/health)
  4. External Traffic → Gateway service (composer) for API aggregation

Internal Fetch with Automatic Service Discovery​

Services within a Watt application can communicate with each other using the automatic service discovery:

// Health check for internal services using Watt's service mesh
globalThis.platformatic.setCustomHealthCheck(async () => {
try {
const healthChecks = await Promise.allSettled([
// Database service health check
fetch('http://api.plt.local/health', { timeout: 2000 }),

// Background worker service health check
fetch('http://worker.plt.local/health', { timeout: 2000 }),

// Composer gateway health check
fetch('http://gateway.plt.local/health', { timeout: 2000 })
])

const allHealthy = healthChecks.every(result => result.status === 'fulfilled' && result.value.ok)

return {
status: allHealthy,
body: JSON.stringify({
service: 'healthy',
dependencies: healthChecks.map((check, index) => ({
service: ['api', 'worker', 'gateway'][index],
status: check.status === 'fulfilled' && check.value.ok ? 'healthy' : 'unhealthy'
}))
})
}
} catch (error) {
return {
status: false,
statusCode: 503,
body: `Health check failed: ${error.message}`
}
}
})

Key Benefits of Watt's Internal Communication:​

  • Zero Configuration: Services are automatically discoverable via {service-id}.plt.local
  • No Network Latency: Communication happens in-process via the service mesh
  • Automatic Load Balancing: Requests are distributed across service workers
  • Built-in Service Discovery: No need for external service registry

Verification and Testing​

Test Health Endpoints Locally​

1. Start your Watt application:

npm start
# or for development
npm run dev

2. Test health endpoints:

# Test readiness endpoint (includes database connectivity check)
curl -v http://localhost:9090/ready
# Expected: 200 OK "Ready" (or custom response)

# Test liveness endpoint (includes database query)
curl -v http://localhost:9090/status
# Expected: 200 OK "Healthy" (or custom response)

# Test the main application endpoint with database integration
curl http://localhost:3042/
# Expected: {"message":"hello world","db_time":"2024-01-01T12:00:00.000Z"}

# Check metrics endpoint
curl http://localhost:9090/metrics
# Expected: Prometheus metrics output

3. Test with failing health checks:

# Stop PostgreSQL to simulate database failure
docker stop postgres-dev # if running locally with Docker
# or kubectl delete pod -l app=postgres # if running in K8s

# Test health endpoints - should now fail
curl http://localhost:9090/status
# Expected: 503 Service Unavailable with database error message

curl http://localhost:9090/ready
# Expected: 503 Service Unavailable

Test in Kubernetes​

1. Deploy to Kubernetes:

# Deploy PostgreSQL first
kubectl apply -f postgres-deployment.yaml

# Wait for PostgreSQL to be ready
kubectl wait --for=condition=ready pod -l app=postgres --timeout=300s

# Deploy the application
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml

2. Monitor pod health:

# Check pod status
kubectl get pods -l app=demo-readiness-liveness

# Watch pod events
kubectl describe pod <pod-name>

# Check probe results
kubectl get events --field-selector reason=Unhealthy

3. Test probe behavior:

# Test health endpoints from within the pod
kubectl exec <pod-name> -- curl -f http://localhost:9090/ready
kubectl exec <pod-name> -- curl -f http://localhost:9090/status

# Watch Kubernetes pod status in real-time
kubectl get pods -l app=watt-health-app -w

# Check pod events for probe failures
kubectl get events --field-selector involvedObject.name=<pod-name>

Verify Probe Configuration​

Check probe timing is appropriate:

# Get current probe configuration
kubectl get deployment demo-readiness-liveness -o yaml | grep -A 10 Probe

Monitor probe metrics:

# Check probe success/failure rates
kubectl top pods
kubectl describe pod <pod-name> | grep -A 5 "Liveness\|Readiness"

Production Configuration Best Practices​

Probe Timing Guidelines​

Startup-dependent applications:

readinessProbe:
httpGet:
path: /ready
port: 9090
initialDelaySeconds: 10 # Short delay for quick apps
periodSeconds: 5 # Frequent checks during startup
timeoutSeconds: 5 # Allow time for health check
successThreshold: 1 # Single success to mark ready
failureThreshold: 3 # Allow some startup failures

livenessProbe:
httpGet:
path: /status
port: 9090
initialDelaySeconds: 30 # Longer delay after initial startup
periodSeconds: 30 # Less frequent checks when running
timeoutSeconds: 10 # More time for complex checks
failureThreshold: 3 # Avoid restart on transient issues

Database-dependent applications:

startupProbe: # Use startup probe for slow initialization
httpGet:
path: /ready
port: 9090
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 30 # Up to 5 minutes for startup

readinessProbe:
httpGet:
path: /ready
port: 9090
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 1 # Quick removal from service if unhealthy

livenessProbe:
httpGet:
path: /status
port: 9090
initialDelaySeconds: 0 # Disabled until startup probe succeeds
periodSeconds: 20
timeoutSeconds: 10
failureThreshold: 3

Troubleshooting​

Pod Failing Readiness Checks​

Problem: Pods remain in "Not Ready" state

Solutions:

# Check health endpoint directly
kubectl exec <pod-name> -- curl http://localhost:9090/ready

# Review application logs
kubectl logs <pod-name>

# Check probe configuration
kubectl describe pod <pod-name> | grep -A 10 Readiness

# Common fixes:
# - Increase initialDelaySeconds if app needs more startup time
# - Check that health dependencies are available
# - Verify metrics server is configured and running on correct port

Pod Continuously Restarting​

Problem: Liveness probes causing restart loops

Solutions:

# Check restart count and reason
kubectl get pods -l app=your-app

# Review pod events
kubectl describe pod <pod-name>

# Check liveness endpoint
kubectl exec <pod-name> -- curl http://localhost:9090/status

# Common fixes:
# - Increase timeoutSeconds for slow health checks
# - Increase failureThreshold to avoid restarts on transient issues
# - Review custom health check logic for potential failures
# - Check if app is properly handling SIGTERM for graceful shutdown

Health Checks Always Failing​

Problem: Health endpoints return 500/404 errors

Solutions:

# Verify metrics server is listening
kubectl exec <pod-name> -- netstat -tlnp | grep :9090

# Check Watt configuration
kubectl exec <pod-name> -- cat watt.json

# Test endpoints with verbose output
kubectl exec <pod-name> -- curl -v http://localhost:9090/ready
kubectl exec <pod-name> -- curl -v http://localhost:9090/status

# Check application logs for errors
kubectl logs <pod-name> --tail=100

# Verify container environment
kubectl exec <pod-name> -- env | grep -E "PLT_|DATABASE_"

# Test database connectivity directly
kubectl exec <pod-name> -- pg_isready -h postgres -p 5432 -U postgres -d healthdb

Common fixes:

  • Ensure metrics.hostname is "0.0.0.0" (not "127.0.0.1" or "localhost")
  • Verify metrics.port matches probe port configuration
  • Check that PLT_SERVER_HOSTNAME=0.0.0.0 environment variable is set
  • Verify DATABASE_URL environment variable is correctly formatted
  • Ensure PostgreSQL service is accessible from the application pod
  • Check that PostgreSQL credentials and database name are correct
  • Ensure custom health check functions handle database connection errors gracefully
  • Verify all Watt services are starting without errors

Slow Startup Times​

Problem: Pods take too long to become ready

Solutions:

# Analyze startup time with timestamps
kubectl logs <pod-name> --timestamps --since=5m

# Check resource usage and limits
kubectl describe pod <pod-name> | grep -A 10 -B 5 "Limits\|Requests"
kubectl top pod <pod-name>

# Profile health check performance
kubectl exec <pod-name> -- time curl -f http://localhost:9090/ready

# Check Node.js startup time
kubectl exec <pod-name> -- ps aux | grep node

Common fixes:

  • Use startup probes for applications with slow initialization (database migrations, cache warming, etc.)
  • Optimize custom health checks - keep them lightweight and fast
  • Increase resources if CPU/memory constrained (check with kubectl top)
  • Remove expensive operations from readiness checks (use async background tasks instead)
  • Pre-build dependencies in Docker image rather than installing at runtime
  • Use Node.js production optimizations (NODE_ENV=production, --max-old-space-size)

Next Steps​

Now that you have robust Kubernetes health checks:

References​

Kubernetes Documentation​

Platformatic Resources​

Best Practices​