How to Configure Kubernetes Health Checks with Watt
Problem​
You're deploying Watt applications to Kubernetes and need robust health checking that:
- Prevents traffic from reaching unhealthy pods
- Automatically restarts failed containers
- Handles complex health dependencies (databases, external services)
- Provides proper startup time for initialization
- Integrates with Kubernetes orchestration patterns
When to use this solution:
- Production Kubernetes deployments
- Applications with external dependencies that need health validation
- Services requiring zero-downtime deployments
- Complex multi-service applications where service health interdependencies matter
Solution Overview​
This guide shows you how to implement comprehensive Kubernetes health checks using Watt's built-in health endpoints. You'll learn to:
- Configure readiness and liveness probes properly
- Implement custom health checks for your application dependencies
- Set appropriate probe timing and thresholds
- Handle startup scenarios and graceful shutdowns
Understanding Kubernetes Health Probes​
Kubernetes uses probes to determine application health:
- Readiness Probe: Determines if the pod is ready to receive traffic. Failed readiness removes the pod from service endpoints.
- Liveness Probe: Determines if the container should be restarted. Failed liveness triggers container restart by Kubernetes.
- Startup Probe: Provides extra time for slow-starting containers. Disables readiness and liveness probes until startup succeeds.
Prerequisites​
Before implementing Kubernetes health checks, you need:
- Node.js 22.19.0+ installed on your development machine (or later)
- Docker for containerization
- Kubernetes cluster access (local or cloud)
- kubectl configured to access your cluster
Installation​
1. Create a new Watt application, remeber to select a @platformatic/node called api:
$ npm create wattpm
Hello YOURNAME, welcome to Watt Utils 3.0.0!
? This folder seems to already contain a Node.js application. Do you want to wrap into Watt? no
? Where would you like to create your project? my-health-app
? Which kind of application do you want to create? @platformatic/node
✔ Installing @platformatic/node@^3.0.3 using pnpm ...
? What is the name of the application? api
? Do you want to create another application? no
? What port do you want to use? 3042
cd web/api; npm install fastify @fastify/postgresql @fastify/autoload; cd ..
Then replace the web/api/index.js file with:
import { getLogger } from '@platformatic/globals'
import fastify from 'fastify'
import autoload from '@fastify/autoload'
import { join } from 'node:path'
export async function create () {
const app = fastify({
loggerIntance: getLogger()
})
// Register PostgreSQL plugin
await app.register(import('@fastify/postgresql'), {
connectionString: process.env.DATABASE_URL || 'postgres://postgres:password@postgres:5432/healthdb'
})
// Autoload routes
await app.register(autoload, {
dir: join(import.meta.dirname, 'routes')
})
app.get('/', async () => {
const client = await app.pg.connect()
try {
const result = await client.query('SELECT NOW() as current_time')
return { message: 'hello world', db_time: result.rows[0].current_time }
} finally {
client.release()
}
})
return app
}
This created a Fastify app that will autoload the routes.
Platformatic Health Check APIs​
Watt provides built-in health check endpoints through its metrics server. The metrics server exposes the following endpoints by default:
/ready(Readiness endpoint): Indicates if all services are started and ready to accept traffic/status(Liveness endpoint): Indicates if all services are healthy and their custom health checks pass
Endpoint Customization​
You can customize the health check endpoints in your Watt configuration:
{
"metrics": {
"hostname": "0.0.0.0",
"port": 9090,
"readiness": {
"endpoint": "/health"
},
"liveness": {
"endpoint": "/live"
}
}
}
Serving Health Endpoints over HTTPS (SSL/TLS)​
Readiness and liveness endpoints run on the metrics server, so enabling HTTPS (TLS, often referred to as SSL) for metrics also enables HTTPS for /ready, /status, and /metrics.
For Kubernetes, store the certificate and private key in a Secret and mount it into the container. Then reference those files from watt.json:
{
"metrics": {
"hostname": "0.0.0.0",
"port": 9090,
"https": {
"key": { "path": "/etc/watt/tls/tls.key" },
"cert": { "path": "/etc/watt/tls/tls.crt" }
}
}
}
You can also provide inline PEM strings, which is convenient for local testing:
{
"metrics": {
"https": {
"key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"cert": "-----BEGIN CERTIFICATE-----\n...\n-----END CERTIFICATE-----\n"
}
}
}
Use file paths for production deployments so certificates can be rotated through your platform's secret management. Both key and cert also accept arrays if your TLS setup requires multiple keys or certificate chains.
When the metrics server uses HTTPS, set scheme: HTTPS on Kubernetes HTTP probes:
readinessProbe:
httpGet:
scheme: HTTPS
path: /ready
port: 9090
livenessProbe:
httpGet:
scheme: HTTPS
path: /status
port: 9090
Kubernetes does not verify the certificate for HTTP probes that use scheme: HTTPS, so self-signed certificates work for probes. Prometheus or other external clients may still need CA configuration.
Service Discovery and Autoload​
By default, Watt automatically loads all services in the web folder via the autoload configuration. You don't need to manually specify each service in the configuration. Watt will:
- Discover all valid Platformatic services in this directory
- Automatically register them in the runtime
- Include them in health check evaluations
- Expose their metrics through the metrics server
This autoload behavior simplifies deployment and ensures all your services are automatically included in the health monitoring system.
Custom Health Check Functions​
setCustomHealthCheck(): Sets a custom liveness check for the/statusendpoint or custom liveness endpoint.setCustomReadinessCheck(): Sets a custom readiness check for the/readyendpoint or custom readiness endpoint.
Both methods accept a function that returns:
- A
booleanvalue (true= healthy,false= unhealthy) - An object with:
status: boolean indicating success/failurestatusCode: optional HTTP status code (defaults to 200/500)body: optional response body
Implementation​
1. Service Implementation with Custom Health Checks​
Update your web/api/index.js to implements comprehensive health checks:
import { getLogger, setCustomHealthCheck, setCustomReadinessCheck } from '@platformatic/globals'
import fastify from 'fastify'
import autoload from '@fastify/autoload'
import { join } from 'node:path'
export async function create () {
const app = fastify({
loggerIntance: getLogger()
})
// Register PostgreSQL plugin
await app.register(import('@fastify/postgresql'), {
connectionString: process.env.DATABASE_URL || 'postgres://postgres:password@postgres:5432/healthdb'
})
// Autoload routes
await app.register(autoload, {
dir: join(import.meta.dirname, 'routes')
})
// Register custom liveness check (for /status endpoint)
setCustomHealthCheck(async () => {
try {
// Check PostgreSQL database connectivity
const client = await app.pg.connect()
try {
await client.query('SELECT 1')
} finally {
client.release()
}
return { status: true }
} catch (err) {
app.log.error({ err }, 'Health check failed')
return {
status: false,
statusCode: 503,
body: `Database health check failed: ${err.message}`
}
}
})
// Register custom readiness check (for /ready endpoint)
setCustomReadinessCheck(async () => {
try {
// Check if PostgreSQL connection pool is ready
if (!app.pg || !app.pg.pool) {
return false
}
// Quick connection test
const client = await app.pg.connect()
try {
await client.query('SELECT 1')
return true
} finally {
client.release()
}
} catch (err) {
app.log.error({ err }, 'Readiness check failed')
return false
}
})
// Add application routes
app.get('/', async () => {
const client = await app.pg.connect()
try {
const result = await client.query('SELECT NOW() as current_time')
return { message: 'hello world', db_time: result.rows[0].current_time }
} finally {
client.release()
}
})
return app
}
2. Watt Configuration​
Configure the metrics server in your watt.json file:
{
"metrics": {
"hostname": "0.0.0.0",
"port": 9090,
"readiness": {
"success": {
"statusCode": 200,
"body": "Ready"
},
"fail": {
"statusCode": 503,
"body": "Not Ready"
}
},
"liveness": {
"success": {
"statusCode": 200,
"body": "Healthy"
},
"fail": {
"statusCode": 503,
"body": "Unhealthy"
}
}
}
}
3. PostgreSQL Database Setup​
First, create a PostgreSQL deployment and service for your database:
postgres-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
labels:
app: postgres
spec:
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15
ports:
- containerPort: 5432
env:
- name: POSTGRES_DB
value: 'healthdb'
- name: POSTGRES_USER
value: 'postgres'
- name: POSTGRES_PASSWORD
value: 'password'
- name: PGDATA
value: '/var/lib/postgresql/data/pgdata'
volumeMounts:
- name: postgres-storage
mountPath: /var/lib/postgresql/data
readinessProbe:
exec:
command:
- pg_isready
- -U
- postgres
- -d
- healthdb
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
livenessProbe:
exec:
command:
- pg_isready
- -U
- postgres
- -d
- healthdb
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 5
volumes:
- name: postgres-storage
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: postgres
labels:
app: postgres
spec:
ports:
- port: 5432
targetPort: 5432
selector:
app: postgres
4. Kubernetes Application Configuration​
Create a Kubernetes deployment configuration that defines the probes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: watt-health-app
labels:
app: watt-health-app
spec:
replicas: 2
selector:
matchLabels:
app: watt-health-app
template:
metadata:
labels:
app: watt-health-app
spec:
containers:
- name: watt-app
image: watt-health-app:latest
ports:
- containerPort: 3042
name: service
- containerPort: 9090
name: metrics
env:
- name: PLT_SERVER_HOSTNAME
value: '0.0.0.0'
- name: DATABASE_URL
value: 'postgres://postgres:password@postgres:5432/healthdb'
readinessProbe:
httpGet:
path: /ready
port: 9090
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
livenessProbe:
httpGet:
path: /status
port: 9090
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
startupProbe:
httpGet:
path: /ready
port: 9090
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 20 # Allow up to 100 seconds for startup
resources:
requests:
memory: '256Mi'
cpu: '250m'
limits:
memory: '512Mi'
cpu: '500m'
If the metrics server uses HTTPS, add scheme: HTTPS under each httpGet block. If the certificate comes from a Kubernetes Secret, mount it into the container:
apiVersion: v1
kind: Secret
metadata:
name: watt-metrics-tls
type: kubernetes.io/tls
stringData:
tls.crt: |
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
tls.key: |
-----BEGIN PRIVATE KEY-----
...
-----END PRIVATE KEY-----
---
# Add this to the watt-app container:
volumeMounts:
- name: metrics-tls
mountPath: /etc/watt/tls
readOnly: true
# Add this to the pod spec:
volumes:
- name: metrics-tls
secret:
secretName: watt-metrics-tls
Key configuration points:
- Startup Probe: Allows up to 100 seconds for application initialization
- Readiness Probe: Checks
/readyendpoint every 10 seconds after startup - Liveness Probe: Checks
/statusendpoint every 30 seconds after startup - Environment Variables:
PLT_SERVER_HOSTNAME=0.0.0.0ensures the app binds to all interfaces
Important Timing Considerations:
- Startup probe runs first and disables other probes until successful
- Readiness probe has lower failure threshold for faster traffic removal
- Liveness probe has higher failure threshold to avoid unnecessary restarts
- Timeout values account for potential network latency
5. Docker Configuration​
Create a Dockerfile for your Watt application:
FROM node:22-alpine
WORKDIR /app
# Copy package files
COPY package*.json ./
RUN npm ci --only=production
# Copy application code
COPY . .
# Expose ports
EXPOSE 3042 9090
# Set environment variables
ENV PLT_SERVER_HOSTNAME=0.0.0.0
ENV NODE_ENV=production
# Health check for Docker
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:9090/ready || exit 1
# Start the application
CMD ["npm", "start"]
Watt Internal Service Communication​
Watt provides a built-in service mesh that enables zero-configuration communication between services using the .plt.local domain. This is crucial for implementing proper health checks in multi-service applications.
Architecture Overview​
The following diagram illustrates how services communicate within a Watt application for health checks in Kubernetes:
graph TB
subgraph "Kubernetes Pod"
subgraph "Watt Runtime"
subgraph "Service Mesh"
Router[Internal Router]
Discovery["Service Discovery<br/>(.plt.local)"]
end
subgraph "Services"
Gateway["Gateway Service<br/>(Composer)<br/>:3001"]
API["API Service<br/>(Backend)<br/>:3002"]
Worker["Worker Service<br/>(Background)<br/>:3003"]
end
subgraph "Health Monitoring"
Metrics["Metrics Server<br/>:9090"]
Health["/ready, /status"]
end
end
end
subgraph "External"
K8s[Kubernetes Probes]
Client[External Clients]
end
%% Health check flows
K8s --> |"GET /ready<br/>GET /status"| Metrics
Metrics --> |"Check service health"| Gateway
Metrics --> |"Check service health"| API
Metrics --> |"Check service health"| Worker
%% Internal service communication
Gateway --> |"fetch('http://api.plt.local/health')"| Router
Gateway --> |"fetch('http://worker.plt.local/health')"| Router
Router --> API
Router --> Worker
%% External access
Client --> |"External requests"| Gateway
%% Service discovery
Discovery -.-> |"Resolves .plt.local"| Router
style Metrics fill:#e1f5fe
style Health fill:#e8f5e8
style Router fill:#fff3e0
style Discovery fill:#fff3e0
Key Communication Patterns:​
- Kubernetes Health Probes → Metrics server (
:9090/ready,:9090/status) - Metrics Server → Individual services for health verification
- Inter-Service Health Checks → Via
.plt.localdomain (e.g.,http://api.plt.local/health) - External Traffic → Gateway service (composer) for API aggregation
Internal Fetch with Automatic Service Discovery​
Services within a Watt application can communicate with each other using the automatic service discovery:
import { setCustomHealthCheck } from '@platformatic/globals'
// Health check for internal services using Watt's service mesh
setCustomHealthCheck(async () => {
try {
const healthChecks = await Promise.allSettled([
// Database service health check
fetch('http://api.plt.local/health', { timeout: 2000 }),
// Background worker service health check
fetch('http://worker.plt.local/health', { timeout: 2000 }),
// Composer gateway health check
fetch('http://gateway.plt.local/health', { timeout: 2000 })
])
const allHealthy = healthChecks.every(result => result.status === 'fulfilled' && result.value.ok)
return {
status: allHealthy,
body: JSON.stringify({
service: 'healthy',
dependencies: healthChecks.map((check, index) => ({
service: ['api', 'worker', 'gateway'][index],
status: check.status === 'fulfilled' && check.value.ok ? 'healthy' : 'unhealthy'
}))
})
}
} catch (error) {
return {
status: false,
statusCode: 503,
body: `Health check failed: ${error.message}`
}
}
})
Key Benefits of Watt's Internal Communication:​
- Zero Configuration: Services are automatically discoverable via
{service-id}.plt.local - No Network Latency: Communication happens in-process via the service mesh
- Automatic Load Balancing: Requests are distributed across service workers
- Built-in Service Discovery: No need for external service registry
Verification and Testing​
Test Health Endpoints Locally​
1. Start your Watt application:
npm start
# or for development
npm run dev
2. Test health endpoints:
# Test readiness endpoint (includes database connectivity check)
curl -v http://localhost:9090/ready
# Expected: 200 OK "Ready" (or custom response)
# Test liveness endpoint (includes database query)
curl -v http://localhost:9090/status
# Expected: 200 OK "Healthy" (or custom response)
# If metrics.https is configured, use HTTPS instead.
# Use -k for local self-signed certificates.
curl -vk https://localhost:9090/ready
curl -vk https://localhost:9090/status
# Test the main application endpoint with database integration
curl http://localhost:3042/
# Expected: {"message":"hello world","db_time":"2024-01-01T12:00:00.000Z"}
# Check metrics endpoint
curl http://localhost:9090/metrics
# Expected: Prometheus metrics output
3. Test with failing health checks:
# Stop PostgreSQL to simulate database failure
docker stop postgres-dev # if running locally with Docker
# or kubectl delete pod -l app=postgres # if running in K8s
# Test health endpoints - should now fail
curl http://localhost:9090/status
# Expected: 503 Service Unavailable with database error message
curl http://localhost:9090/ready
# Expected: 503 Service Unavailable
Test in Kubernetes​
1. Deploy to Kubernetes:
# Deploy PostgreSQL first
kubectl apply -f postgres-deployment.yaml
# Wait for PostgreSQL to be ready
kubectl wait --for=condition=ready pod -l app=postgres --timeout=300s
# Deploy the application
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
2. Monitor pod health:
# Check pod status
kubectl get pods -l app=demo-readiness-liveness
# Watch pod events
kubectl describe pod <pod-name>
# Check probe results
kubectl get events --field-selector reason=Unhealthy
3. Test probe behavior:
# Test health endpoints from within the pod
kubectl exec <pod-name> -- curl -f http://localhost:9090/ready
kubectl exec <pod-name> -- curl -f http://localhost:9090/status
# If metrics.https is configured, use HTTPS.
# Use -k when the pod uses a self-signed certificate.
kubectl exec <pod-name> -- curl -fk https://localhost:9090/ready
kubectl exec <pod-name> -- curl -fk https://localhost:9090/status
# Watch Kubernetes pod status in real-time
kubectl get pods -l app=watt-health-app -w
# Check pod events for probe failures
kubectl get events --field-selector involvedObject.name=<pod-name>
Verify Probe Configuration​
Check probe timing is appropriate:
# Get current probe configuration
kubectl get deployment demo-readiness-liveness -o yaml | grep -A 10 Probe
Monitor probe metrics:
# Check probe success/failure rates
kubectl top pods
kubectl describe pod <pod-name> | grep -A 5 "Liveness\|Readiness"
Production Configuration Best Practices​
Probe Timing Guidelines​
Startup-dependent applications:
readinessProbe:
httpGet:
path: /ready
port: 9090
initialDelaySeconds: 10 # Short delay for quick apps
periodSeconds: 5 # Frequent checks during startup
timeoutSeconds: 5 # Allow time for health check
successThreshold: 1 # Single success to mark ready
failureThreshold: 3 # Allow some startup failures
livenessProbe:
httpGet:
path: /status
port: 9090
initialDelaySeconds: 30 # Longer delay after initial startup
periodSeconds: 30 # Less frequent checks when running
timeoutSeconds: 10 # More time for complex checks
failureThreshold: 3 # Avoid restart on transient issues
Database-dependent applications:
startupProbe: # Use startup probe for slow initialization
httpGet:
path: /ready
port: 9090
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 30 # Up to 5 minutes for startup
readinessProbe:
httpGet:
path: /ready
port: 9090
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 1 # Quick removal from service if unhealthy
livenessProbe:
httpGet:
path: /status
port: 9090
initialDelaySeconds: 0 # Disabled until startup probe succeeds
periodSeconds: 20
timeoutSeconds: 10
failureThreshold: 3
Troubleshooting​
Pod Failing Readiness Checks​
Problem: Pods remain in "Not Ready" state
Solutions:
# Check health endpoint directly
kubectl exec <pod-name> -- curl http://localhost:9090/ready
# Review application logs
kubectl logs <pod-name>
# Check probe configuration
kubectl describe pod <pod-name> | grep -A 10 Readiness
# Common fixes:
# - Increase initialDelaySeconds if app needs more startup time
# - Check that health dependencies are available
# - Verify metrics server is configured and running on correct port
Pod Continuously Restarting​
Problem: Liveness probes causing restart loops
Solutions:
# Check restart count and reason
kubectl get pods -l app=your-app
# Review pod events
kubectl describe pod <pod-name>
# Check liveness endpoint
kubectl exec <pod-name> -- curl http://localhost:9090/status
# Common fixes:
# - Increase timeoutSeconds for slow health checks
# - Increase failureThreshold to avoid restarts on transient issues
# - Review custom health check logic for potential failures
# - Check if app is properly handling SIGTERM for graceful shutdown
Health Checks Always Failing​
Problem: Health endpoints return 500/404 errors
Solutions:
# Verify metrics server is listening
kubectl exec <pod-name> -- netstat -tlnp | grep :9090
# Check Watt configuration
kubectl exec <pod-name> -- cat watt.json
# Test endpoints with verbose output
kubectl exec <pod-name> -- curl -v http://localhost:9090/ready
kubectl exec <pod-name> -- curl -v http://localhost:9090/status
# If metrics.https is configured, test HTTPS instead.
kubectl exec <pod-name> -- curl -vk https://localhost:9090/ready
kubectl exec <pod-name> -- curl -vk https://localhost:9090/status
# Check application logs for errors
kubectl logs <pod-name> --tail=100
# Verify container environment
kubectl exec <pod-name> -- env | grep -E "PLT_|DATABASE_"
# Test database connectivity directly
kubectl exec <pod-name> -- pg_isready -h postgres -p 5432 -U postgres -d healthdb
Common fixes:
- Ensure
metrics.hostnameis"0.0.0.0"(not"127.0.0.1"or"localhost") - Verify
metrics.portmatches probe port configuration - Check that
PLT_SERVER_HOSTNAME=0.0.0.0environment variable is set - Verify
DATABASE_URLenvironment variable is correctly formatted - Ensure PostgreSQL service is accessible from the application pod
- Check that PostgreSQL credentials and database name are correct
- Ensure custom health check functions handle database connection errors gracefully
- Verify all Watt services are starting without errors
Slow Startup Times​
Problem: Pods take too long to become ready
Solutions:
# Analyze startup time with timestamps
kubectl logs <pod-name> --timestamps --since=5m
# Check resource usage and limits
kubectl describe pod <pod-name> | grep -A 10 -B 5 "Limits\|Requests"
kubectl top pod <pod-name>
# Profile health check performance
kubectl exec <pod-name> -- time curl -f http://localhost:9090/ready
# Check Node.js startup time
kubectl exec <pod-name> -- ps aux | grep node
Common fixes:
- Use startup probes for applications with slow initialization (database migrations, cache warming, etc.)
- Optimize custom health checks - keep them lightweight and fast
- Increase resources if CPU/memory constrained (check with
kubectl top) - Remove expensive operations from readiness checks (use async background tasks instead)
- Pre-build dependencies in Docker image rather than installing at runtime
- Use Node.js production optimizations (
NODE_ENV=production,--max-old-space-size)
Next Steps​
Now that you have robust Kubernetes health checks:
- Configure monitoring - Track health check metrics with Prometheus
- Set up logging - Centralize health check logs for debugging
- Container deployment guide - Optimize your Docker setup
- TypeScript compilation - Production builds and optimization
References​
Kubernetes Documentation​
Platformatic Resources​
- Watt Runtime Configuration - Complete metrics configuration reference
- Node.js Capability Reference - Custom health check API documentation
- Example Application - Complete working example with Kubernetes manifests