How to Configure Kubernetes Health Checks with Watt
Problem​
You're deploying Watt applications to Kubernetes and need robust health checking that:
- Prevents traffic from reaching unhealthy pods
- Automatically restarts failed containers
- Handles complex health dependencies (databases, external services)
- Provides proper startup time for initialization
- Integrates with Kubernetes orchestration patterns
When to use this solution:
- Production Kubernetes deployments
- Applications with external dependencies that need health validation
- Services requiring zero-downtime deployments
- Complex multi-service applications where service health interdependencies matter
Solution Overview​
This guide shows you how to implement comprehensive Kubernetes health checks using Watt's built-in health endpoints. You'll learn to:
- Configure readiness and liveness probes properly
- Implement custom health checks for your application dependencies
- Set appropriate probe timing and thresholds
- Handle startup scenarios and graceful shutdowns
Understanding Kubernetes Health Probes​
Kubernetes uses probes to determine application health:
- Readiness Probe: Determines if the pod is ready to receive traffic. Failed readiness removes the pod from service endpoints.
- Liveness Probe: Determines if the container should be restarted. Failed liveness triggers container restart by Kubernetes.
- Startup Probe: Provides extra time for slow-starting containers. Disables readiness and liveness probes until startup succeeds.
Prerequisites​
Before implementing Kubernetes health checks, you need:
- Node.js 22.19.0+ installed on your development machine (or later)
- Docker for containerization
- Kubernetes cluster access (local or cloud)
- kubectl configured to access your cluster
Installation​
1. Create a new Watt application, remeber to select a @platformatic/node
called api
:
$ npm create wattpm
Hello YOURNAME, welcome to Watt Utils 3.0.0!
? This folder seems to already contain a Node.js application. Do you want to wrap into Watt? no
? Where would you like to create your project? my-health-app
? Which kind of application do you want to create? @platformatic/node
✔ Installing @platformatic/node@^3.0.3 using pnpm ...
? What is the name of the application? api
? Do you want to create another application? no
? What port do you want to use? 3042
cd web/api; npm install fastify @fastify/postgresql @fastify/autoload; cd ..
Then replace the web/api/index.js
file with:
import fastify from 'fastify'
import autoload from '@fastify/autoload'
import { join } from 'node:path'
export async function create () {
const app = fastify({
loggerIntance: globalThis.platformatic?.logger
})
// Register PostgreSQL plugin
await app.register(import('@fastify/postgresql'), {
connectionString: process.env.DATABASE_URL || 'postgres://postgres:password@postgres:5432/healthdb'
})
// Autoload routes
await app.register(autoload, {
dir: join(import.meta.dirname, 'routes')
})
app.get('/', async () => {
const client = await app.pg.connect()
try {
const result = await client.query('SELECT NOW() as current_time')
return { message: 'hello world', db_time: result.rows[0].current_time }
} finally {
client.release()
}
})
return app
}
This created a Fastify app that will autoload the routes.
Platformatic Health Check APIs​
Watt provides built-in health check endpoints through its metrics server. The metrics server exposes the following endpoints by default:
/ready
(Readiness endpoint): Indicates if all services are started and ready to accept traffic/status
(Liveness endpoint): Indicates if all services are healthy and their custom health checks pass
Endpoint Customization​
You can customize the health check endpoints in your Watt configuration:
{
"metrics": {
"hostname": "0.0.0.0",
"port": 9090,
"readiness": {
"endpoint": "/health"
},
"liveness": {
"endpoint": "/live"
}
}
}
Service Discovery and Autoload​
By default, Watt automatically loads all services in the web
folder via the autoload configuration. You don't need to manually specify each service in the configuration. Watt will:
- Discover all valid Platformatic services in this directory
- Automatically register them in the runtime
- Include them in health check evaluations
- Expose their metrics through the metrics server
This autoload behavior simplifies deployment and ensures all your services are automatically included in the health monitoring system.
Custom Health Check Functions​
setCustomHealthCheck
: Sets a custom liveness check function that runs on the/status
(or custom liveness) endpointsetCustomReadinessCheck
: Sets a custom readiness check function that runs on the/ready
(or custom readiness) endpoint
Both methods accept a function that returns:
- A
boolean
value (true
= healthy,false
= unhealthy) - An object with:
status
: boolean indicating success/failurestatusCode
: optional HTTP status code (defaults to 200/500)body
: optional response body
Implementation​
1. Service Implementation with Custom Health Checks​
Update your web/api/index.js
to implements comprehensive health checks:
import fastify from 'fastify'
import autoload from '@fastify/autoload'
import { join } from 'node:path'
export async function create () {
const app = fastify({
loggerIntance: globalThis.platformatic?.logger
})
// Register PostgreSQL plugin
await app.register(import('@fastify/postgresql'), {
connectionString: process.env.DATABASE_URL || 'postgres://postgres:password@postgres:5432/healthdb'
})
// Autoload routes
await app.register(autoload, {
dir: join(import.meta.dirname, 'routes')
})
// Register custom liveness check (for /status endpoint)
globalThis.platformatic.setCustomHealthCheck(async () => {
try {
// Check PostgreSQL database connectivity
const client = await app.pg.connect()
try {
await client.query('SELECT 1')
} finally {
client.release()
}
return { status: true }
} catch (err) {
app.log.error({ err }, 'Health check failed')
return {
status: false,
statusCode: 503,
body: `Database health check failed: ${err.message}`
}
}
})
// Register custom readiness check (for /ready endpoint)
globalThis.platformatic.setCustomReadinessCheck(async () => {
try {
// Check if PostgreSQL connection pool is ready
if (!app.pg || !app.pg.pool) {
return false
}
// Quick connection test
const client = await app.pg.connect()
try {
await client.query('SELECT 1')
return true
} finally {
client.release()
}
} catch (err) {
app.log.error({ err }, 'Readiness check failed')
return false
}
})
// Add application routes
app.get('/', async () => {
const client = await app.pg.connect()
try {
const result = await client.query('SELECT NOW() as current_time')
return { message: 'hello world', db_time: result.rows[0].current_time }
} finally {
client.release()
}
})
return app
}
2. Watt Configuration​
Configure the metrics server in your watt.json
file:
{
"metrics": {
"hostname": "0.0.0.0",
"port": 9090,
"readiness": {
"success": {
"statusCode": 200,
"body": "Ready"
},
"fail": {
"statusCode": 503,
"body": "Not Ready"
}
},
"liveness": {
"success": {
"statusCode": 200,
"body": "Healthy"
},
"fail": {
"statusCode": 503,
"body": "Unhealthy"
}
}
}
}
3. PostgreSQL Database Setup​
First, create a PostgreSQL deployment and service for your database:
postgres-deployment.yaml
:
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
labels:
app: postgres
spec:
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15
ports:
- containerPort: 5432
env:
- name: POSTGRES_DB
value: 'healthdb'
- name: POSTGRES_USER
value: 'postgres'
- name: POSTGRES_PASSWORD
value: 'password'
- name: PGDATA
value: '/var/lib/postgresql/data/pgdata'
volumeMounts:
- name: postgres-storage
mountPath: /var/lib/postgresql/data
readinessProbe:
exec:
command:
- pg_isready
- -U
- postgres
- -d
- healthdb
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
livenessProbe:
exec:
command:
- pg_isready
- -U
- postgres
- -d
- healthdb
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 5
volumes:
- name: postgres-storage
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: postgres
labels:
app: postgres
spec:
ports:
- port: 5432
targetPort: 5432
selector:
app: postgres
4. Kubernetes Application Configuration​
Create a Kubernetes deployment configuration that defines the probes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: watt-health-app
labels:
app: watt-health-app
spec:
replicas: 2
selector:
matchLabels:
app: watt-health-app
template:
metadata:
labels:
app: watt-health-app
spec:
containers:
- name: watt-app
image: watt-health-app:latest
ports:
- containerPort: 3042
name: service
- containerPort: 9090
name: metrics
env:
- name: PLT_SERVER_HOSTNAME
value: '0.0.0.0'
- name: DATABASE_URL
value: 'postgres://postgres:password@postgres:5432/healthdb'
readinessProbe:
httpGet:
path: /ready
port: 9090
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
livenessProbe:
httpGet:
path: /status
port: 9090
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
startupProbe:
httpGet:
path: /ready
port: 9090
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 20 # Allow up to 100 seconds for startup
resources:
requests:
memory: '256Mi'
cpu: '250m'
limits:
memory: '512Mi'
cpu: '500m'
Key configuration points:
- Startup Probe: Allows up to 100 seconds for application initialization
- Readiness Probe: Checks
/ready
endpoint every 10 seconds after startup - Liveness Probe: Checks
/status
endpoint every 30 seconds after startup - Environment Variables:
PLT_SERVER_HOSTNAME=0.0.0.0
ensures the app binds to all interfaces
Important Timing Considerations:
- Startup probe runs first and disables other probes until successful
- Readiness probe has lower failure threshold for faster traffic removal
- Liveness probe has higher failure threshold to avoid unnecessary restarts
- Timeout values account for potential network latency
5. Docker Configuration​
Create a Dockerfile
for your Watt application:
FROM node:22-alpine
WORKDIR /app
# Copy package files
COPY package*.json ./
RUN npm ci --only=production
# Copy application code
COPY . .
# Expose ports
EXPOSE 3042 9090
# Set environment variables
ENV PLT_SERVER_HOSTNAME=0.0.0.0
ENV NODE_ENV=production
# Health check for Docker
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:9090/ready || exit 1
# Start the application
CMD ["npm", "start"]
Watt Internal Service Communication​
Watt provides a built-in service mesh that enables zero-configuration communication between services using the .plt.local
domain. This is crucial for implementing proper health checks in multi-service applications.
Architecture Overview​
The following diagram illustrates how services communicate within a Watt application for health checks in Kubernetes:
graph TB
subgraph "Kubernetes Pod"
subgraph "Watt Runtime"
subgraph "Service Mesh"
Router[Internal Router]
Discovery["Service Discovery<br/>(.plt.local)"]
end
subgraph "Services"
Gateway["Gateway Service<br/>(Composer)<br/>:3001"]
API["API Service<br/>(Backend)<br/>:3002"]
Worker["Worker Service<br/>(Background)<br/>:3003"]
end
subgraph "Health Monitoring"
Metrics["Metrics Server<br/>:9090"]
Health["/ready, /status"]
end
end
end
subgraph "External"
K8s[Kubernetes Probes]
Client[External Clients]
end
%% Health check flows
K8s --> |"GET /ready<br/>GET /status"| Metrics
Metrics --> |"Check service health"| Gateway
Metrics --> |"Check service health"| API
Metrics --> |"Check service health"| Worker
%% Internal service communication
Gateway --> |"fetch('http://api.plt.local/health')"| Router
Gateway --> |"fetch('http://worker.plt.local/health')"| Router
Router --> API
Router --> Worker
%% External access
Client --> |"External requests"| Gateway
%% Service discovery
Discovery -.-> |"Resolves .plt.local"| Router
style Metrics fill:#e1f5fe
style Health fill:#e8f5e8
style Router fill:#fff3e0
style Discovery fill:#fff3e0
Key Communication Patterns:​
- Kubernetes Health Probes → Metrics server (
:9090/ready
,:9090/status
) - Metrics Server → Individual services for health verification
- Inter-Service Health Checks → Via
.plt.local
domain (e.g.,http://api.plt.local/health
) - External Traffic → Gateway service (composer) for API aggregation
Internal Fetch with Automatic Service Discovery​
Services within a Watt application can communicate with each other using the automatic service discovery:
// Health check for internal services using Watt's service mesh
globalThis.platformatic.setCustomHealthCheck(async () => {
try {
const healthChecks = await Promise.allSettled([
// Database service health check
fetch('http://api.plt.local/health', { timeout: 2000 }),
// Background worker service health check
fetch('http://worker.plt.local/health', { timeout: 2000 }),
// Composer gateway health check
fetch('http://gateway.plt.local/health', { timeout: 2000 })
])
const allHealthy = healthChecks.every(result => result.status === 'fulfilled' && result.value.ok)
return {
status: allHealthy,
body: JSON.stringify({
service: 'healthy',
dependencies: healthChecks.map((check, index) => ({
service: ['api', 'worker', 'gateway'][index],
status: check.status === 'fulfilled' && check.value.ok ? 'healthy' : 'unhealthy'
}))
})
}
} catch (error) {
return {
status: false,
statusCode: 503,
body: `Health check failed: ${error.message}`
}
}
})
Key Benefits of Watt's Internal Communication:​
- Zero Configuration: Services are automatically discoverable via
{service-id}.plt.local
- No Network Latency: Communication happens in-process via the service mesh
- Automatic Load Balancing: Requests are distributed across service workers
- Built-in Service Discovery: No need for external service registry
Verification and Testing​
Test Health Endpoints Locally​
1. Start your Watt application:
npm start
# or for development
npm run dev
2. Test health endpoints:
# Test readiness endpoint (includes database connectivity check)
curl -v http://localhost:9090/ready
# Expected: 200 OK "Ready" (or custom response)
# Test liveness endpoint (includes database query)
curl -v http://localhost:9090/status
# Expected: 200 OK "Healthy" (or custom response)
# Test the main application endpoint with database integration
curl http://localhost:3042/
# Expected: {"message":"hello world","db_time":"2024-01-01T12:00:00.000Z"}
# Check metrics endpoint
curl http://localhost:9090/metrics
# Expected: Prometheus metrics output
3. Test with failing health checks:
# Stop PostgreSQL to simulate database failure
docker stop postgres-dev # if running locally with Docker
# or kubectl delete pod -l app=postgres # if running in K8s
# Test health endpoints - should now fail
curl http://localhost:9090/status
# Expected: 503 Service Unavailable with database error message
curl http://localhost:9090/ready
# Expected: 503 Service Unavailable
Test in Kubernetes​
1. Deploy to Kubernetes:
# Deploy PostgreSQL first
kubectl apply -f postgres-deployment.yaml
# Wait for PostgreSQL to be ready
kubectl wait --for=condition=ready pod -l app=postgres --timeout=300s
# Deploy the application
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
2. Monitor pod health:
# Check pod status
kubectl get pods -l app=demo-readiness-liveness
# Watch pod events
kubectl describe pod <pod-name>
# Check probe results
kubectl get events --field-selector reason=Unhealthy
3. Test probe behavior:
# Test health endpoints from within the pod
kubectl exec <pod-name> -- curl -f http://localhost:9090/ready
kubectl exec <pod-name> -- curl -f http://localhost:9090/status
# Watch Kubernetes pod status in real-time
kubectl get pods -l app=watt-health-app -w
# Check pod events for probe failures
kubectl get events --field-selector involvedObject.name=<pod-name>
Verify Probe Configuration​
Check probe timing is appropriate:
# Get current probe configuration
kubectl get deployment demo-readiness-liveness -o yaml | grep -A 10 Probe
Monitor probe metrics:
# Check probe success/failure rates
kubectl top pods
kubectl describe pod <pod-name> | grep -A 5 "Liveness\|Readiness"
Production Configuration Best Practices​
Probe Timing Guidelines​
Startup-dependent applications:
readinessProbe:
httpGet:
path: /ready
port: 9090
initialDelaySeconds: 10 # Short delay for quick apps
periodSeconds: 5 # Frequent checks during startup
timeoutSeconds: 5 # Allow time for health check
successThreshold: 1 # Single success to mark ready
failureThreshold: 3 # Allow some startup failures
livenessProbe:
httpGet:
path: /status
port: 9090
initialDelaySeconds: 30 # Longer delay after initial startup
periodSeconds: 30 # Less frequent checks when running
timeoutSeconds: 10 # More time for complex checks
failureThreshold: 3 # Avoid restart on transient issues
Database-dependent applications:
startupProbe: # Use startup probe for slow initialization
httpGet:
path: /ready
port: 9090
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 30 # Up to 5 minutes for startup
readinessProbe:
httpGet:
path: /ready
port: 9090
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 1 # Quick removal from service if unhealthy
livenessProbe:
httpGet:
path: /status
port: 9090
initialDelaySeconds: 0 # Disabled until startup probe succeeds
periodSeconds: 20
timeoutSeconds: 10
failureThreshold: 3
Troubleshooting​
Pod Failing Readiness Checks​
Problem: Pods remain in "Not Ready" state
Solutions:
# Check health endpoint directly
kubectl exec <pod-name> -- curl http://localhost:9090/ready
# Review application logs
kubectl logs <pod-name>
# Check probe configuration
kubectl describe pod <pod-name> | grep -A 10 Readiness
# Common fixes:
# - Increase initialDelaySeconds if app needs more startup time
# - Check that health dependencies are available
# - Verify metrics server is configured and running on correct port
Pod Continuously Restarting​
Problem: Liveness probes causing restart loops
Solutions:
# Check restart count and reason
kubectl get pods -l app=your-app
# Review pod events
kubectl describe pod <pod-name>
# Check liveness endpoint
kubectl exec <pod-name> -- curl http://localhost:9090/status
# Common fixes:
# - Increase timeoutSeconds for slow health checks
# - Increase failureThreshold to avoid restarts on transient issues
# - Review custom health check logic for potential failures
# - Check if app is properly handling SIGTERM for graceful shutdown
Health Checks Always Failing​
Problem: Health endpoints return 500/404 errors
Solutions:
# Verify metrics server is listening
kubectl exec <pod-name> -- netstat -tlnp | grep :9090
# Check Watt configuration
kubectl exec <pod-name> -- cat watt.json
# Test endpoints with verbose output
kubectl exec <pod-name> -- curl -v http://localhost:9090/ready
kubectl exec <pod-name> -- curl -v http://localhost:9090/status
# Check application logs for errors
kubectl logs <pod-name> --tail=100
# Verify container environment
kubectl exec <pod-name> -- env | grep -E "PLT_|DATABASE_"
# Test database connectivity directly
kubectl exec <pod-name> -- pg_isready -h postgres -p 5432 -U postgres -d healthdb
Common fixes:
- Ensure
metrics.hostname
is"0.0.0.0"
(not"127.0.0.1"
or"localhost"
) - Verify
metrics.port
matches probe port configuration - Check that
PLT_SERVER_HOSTNAME=0.0.0.0
environment variable is set - Verify
DATABASE_URL
environment variable is correctly formatted - Ensure PostgreSQL service is accessible from the application pod
- Check that PostgreSQL credentials and database name are correct
- Ensure custom health check functions handle database connection errors gracefully
- Verify all Watt services are starting without errors
Slow Startup Times​
Problem: Pods take too long to become ready
Solutions:
# Analyze startup time with timestamps
kubectl logs <pod-name> --timestamps --since=5m
# Check resource usage and limits
kubectl describe pod <pod-name> | grep -A 10 -B 5 "Limits\|Requests"
kubectl top pod <pod-name>
# Profile health check performance
kubectl exec <pod-name> -- time curl -f http://localhost:9090/ready
# Check Node.js startup time
kubectl exec <pod-name> -- ps aux | grep node
Common fixes:
- Use startup probes for applications with slow initialization (database migrations, cache warming, etc.)
- Optimize custom health checks - keep them lightweight and fast
- Increase resources if CPU/memory constrained (check with
kubectl top
) - Remove expensive operations from readiness checks (use async background tasks instead)
- Pre-build dependencies in Docker image rather than installing at runtime
- Use Node.js production optimizations (
NODE_ENV=production
,--max-old-space-size
)
Next Steps​
Now that you have robust Kubernetes health checks:
- Configure monitoring - Track health check metrics with Prometheus
- Set up logging - Centralize health check logs for debugging
- Container deployment guide - Optimize your Docker setup
- TypeScript compilation - Production builds and optimization
References​
Kubernetes Documentation​
Platformatic Resources​
- Watt Runtime Configuration - Complete metrics configuration reference
- Node.js Capability Reference - Custom health check API documentation
- Example Application - Complete working example with Kubernetes manifests