Version: Next

Monitoring and Observability

This guide covers how to implement comprehensive monitoring and observability for your Watt applications, including logging, metrics, tracing, and health checks.

Overview

Observability is crucial for production applications. Watt provides built-in support for:

Structured Logging: JSON-based logging with configurable levels
Metrics Collection: Prometheus-compatible metrics for performance monitoring
Distributed Tracing: OpenTelemetry integration for request tracing
Health Checks: Application and service health endpoints
Error Tracking: Centralized error collection and alerting

Quick Setup

The fastest way to enable monitoring is during Watt application creation:

wattpm create my-app
# Select monitoring options during setup

cd my-app
wattpm dev

Logging Configuration

Basic Logging

Configure logging in your watt.json:

{
  "server": {
    "logger": {
      "level": "info",
      "prettyPrint": false
    }
  }
}

Structured Logging

For production, use structured JSON logging:

{
  "server": {
    "logger": {
      "level": "info",
      "prettyPrint": false,
      "redact": ["password", "authorization"],
      "serializers": {
        "req": "pino-std-serializers.req",
        "res": "pino-std-serializers.res"
      }
    }
  }
}

Log Levels

Available log levels (in order of verbosity):

fatal: Only fatal errors
error: Errors and fatal
warn: Warnings, errors, and fatal
info: General information (recommended for production)
debug: Debug information
trace: Very detailed debug information

Custom Logging in Services

Add logging to your service plugins:

// services/api/plugin.js
module.exports = async function (app) {
  app.get('/api/users', async (request, reply) => {
    request.log.info({ userId: 123 }, 'User data requested')
    
    try {
      const users = await getUsers()
      request.log.debug({ count: users.length }, 'Users retrieved')
      return users
    } catch (error) {
      request.log.error({ error }, 'Failed to get users')
      throw error
    }
  })
}

Metrics Collection

Enable Metrics

Configure Prometheus-compatible metrics:

{
  "metrics": {
    "enabled": true,
    "endpoint": "/metrics",
    "auth": {
      "username": "{PLT_METRICS_USERNAME}",
      "password": "{PLT_METRICS_PASSWORD}"
    }
  }
}

Built-in Metrics

Watt automatically collects:

HTTP Metrics: Request count, duration, status codes
System Metrics: CPU, memory, event loop lag
Database Metrics: Connection pool, query duration (if using Database Service)
Custom Metrics: Application-specific metrics you define

Custom Metrics

Add custom metrics to your services:

// services/api/plugin.js
module.exports = async function (app) {
  // Counter for API calls
  const apiCallsCounter = app.metrics.counter({
    name: 'api_calls_total',
    help: 'Total number of API calls',
    labelNames: ['method', 'endpoint', 'status']
  })

  // Histogram for response times
  const responseTimeHistogram = app.metrics.histogram({
    name: 'api_response_time_seconds',
    help: 'API response time in seconds',
    buckets: [0.1, 0.5, 1.0, 2.0, 5.0]
  })

  app.addHook('onRequest', async (request, reply) => {
    request.startTime = Date.now()
  })

  app.addHook('onResponse', async (request, reply) => {
    const duration = (Date.now() - request.startTime) / 1000
    
    apiCallsCounter.inc({
      method: request.method,
      endpoint: request.routerPath,
      status: reply.statusCode
    })
    
    responseTimeHistogram.observe(duration)
  })
}

Metrics Dashboard

Access metrics at http://localhost:3042/metrics or configure with monitoring systems:

# prometheus.yml
scrape_configs:
  - job_name: 'watt-app'
    static_configs:
      - targets: ['localhost:3042']
    metrics_path: '/metrics'
    basic_auth:
      username: 'metrics_user'
      password: 'secure_password'

Distributed Tracing

OpenTelemetry Setup

Enable tracing for request correlation across services:

{
  "telemetry": {
    "serviceName": "my-watt-app",
    "version": "1.0.0",
    "exporter": {
      "type": "otlp",
      "options": {
        "url": "http://localhost:4318/v1/traces"
      }
    }
  }
}

Tracing in Services

Tracing is automatically enabled for HTTP requests, database queries, and inter-service calls. Add custom spans:

// services/api/plugin.js
module.exports = async function (app) {
  app.get('/api/complex-operation', async (request, reply) => {
    const tracer = app.openTelemetry.tracer
    
    return await tracer.startActiveSpan('complex-operation', async (span) => {
      try {
        span.setAttributes({
          'operation.type': 'data-processing',
          'user.id': request.user.id
        })
        
        const result = await processData()
        span.setStatus({ code: 1 }) // SUCCESS
        return result
      } catch (error) {
        span.recordException(error)
        span.setStatus({ code: 2, message: error.message }) // ERROR
        throw error
      } finally {
        span.end()
      }
    })
  })
}

Jaeger Integration

For local development with Jaeger:

# Start Jaeger (using Docker)
docker run -d --name jaeger \
  -p 16686:16686 \
  -p 14268:14268 \
  jaegertracing/all-in-one:latest

# Configure Watt to send traces to Jaeger

{
  "telemetry": {
    "serviceName": "my-watt-app",
    "exporter": {
      "type": "jaeger",
      "options": {
        "endpoint": "http://localhost:14268/api/traces"
      }
    }
  }
}

Health Checks

Application Health

Configure health check endpoints:

{
  "server": {
    "healthCheck": {
      "enabled": true,
      "interval": 5000,
      "exposeUptime": true
    }
  }
}

This creates health endpoints:

GET /status - Basic health status
GET /status/live - Kubernetes liveness probe
GET /status/ready - Kubernetes readiness probe

Service-Specific Health Checks

Add custom health checks:

// services/api/plugin.js
module.exports = async function (app) {
  app.register(require('@fastify/under-pressure'), {
    healthCheck: async function () {
      // Custom health check logic
      const dbHealthy = await checkDatabaseConnection()
      const externalServiceHealthy = await checkExternalAPI()
      
      if (!dbHealthy || !externalServiceHealthy) {
        throw new Error('Service dependencies unavailable')
      }
      
      return { status: 'ok', timestamp: Date.now() }
    },
    healthCheckInterval: 10000
  })
}

Database Health Checks

For Database Services, health checks include:

// Automatic database connection health check
app.register(require('@fastify/under-pressure'), {
  healthCheck: async function () {
    try {
      await app.platformatic.db.query('SELECT 1')
      return { database: 'connected' }
    } catch (error) {
      throw new Error(`Database connection failed: ${error.message}`)
    }
  }
})

Log Aggregation

ELK Stack Integration

Send logs to Elasticsearch:

{
  "server": {
    "logger": {
      "level": "info",
      "stream": {
        "type": "tcp",
        "host": "elasticsearch.example.com",
        "port": 9200
      }
    }
  }
}

Fluentd Integration

Configure structured logging for Fluentd:

{
  "server": {
    "logger": {
      "level": "info",
      "prettyPrint": false,
      "timestamp": true,
      "formatters": {
        "level": "levelName"
      }
    }
  }
}

For more details on ELK integration, see the Logging to Elasticsearch guide.

Error Tracking

Sentry Integration

Add error tracking with Sentry:

npm install @sentry/node @sentry/integrations

// services/api/plugin.js
const Sentry = require('@sentry/node')

module.exports = async function (app) {
  Sentry.init({
    dsn: process.env.PLT_SENTRY_DSN,
    environment: process.env.NODE_ENV,
    integrations: [
      new Sentry.Integrations.Http({ tracing: true })
    ],
    tracesSampleRate: 1.0
  })

  // Global error handler
  app.setErrorHandler(async (error, request, reply) => {
    Sentry.captureException(error, {
      user: { id: request.user?.id },
      extra: {
        url: request.url,
        method: request.method,
        headers: request.headers
      }
    })
    
    request.log.error({ error }, 'Unhandled error')
    reply.status(500).send({ error: 'Internal Server Error' })
  })
}

Production Monitoring Setup

Kubernetes Deployment

Complete monitoring setup for Kubernetes:

# monitoring.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: watt-config
data:
  watt.json: |
    {
      "server": {
        "logger": { "level": "info", "prettyPrint": false },
        "healthCheck": { "enabled": true }
      },
      "metrics": {
        "enabled": true,
        "endpoint": "/metrics"
      },
      "telemetry": {
        "serviceName": "watt-production",
        "exporter": {
          "type": "otlp",
          "options": {
            "url": "http://otel-collector:4318/v1/traces"
          }
        }
      }
    }
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: watt-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: watt-app
  template:
    metadata:
      labels:
        app: watt-app
    spec:
      containers:
      - name: app
        image: my-watt-app:latest
        ports:
        - containerPort: 3042
        env:
        - name: NODE_ENV
          value: "production"
        livenessProbe:
          httpGet:
            path: /status/live
            port: 3042
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /status/ready
            port: 3042
          initialDelaySeconds: 5
          periodSeconds: 5
        volumeMounts:
        - name: config
          mountPath: /app/watt.json
          subPath: watt.json
      volumes:
      - name: config
        configMap:
          name: watt-config

Docker Compose Monitoring Stack

Complete monitoring with Prometheus, Grafana, and Jaeger:

# docker-compose.monitoring.yml
version: '3.8'
services:
  app:
    build: .
    ports:
      - "3042:3042"
    environment:
      - NODE_ENV=production
    volumes:
      - ./watt.production.json:/app/watt.json

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana-storage:/var/lib/grafana

  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"
      - "14268:14268"
    environment:
      - COLLECTOR_OTLP_ENABLED=true

volumes:
  grafana-storage:

Monitoring Best Practices

Alerting Rules

Set up alerts for critical metrics:

# alerts.yml
groups:
- name: watt-app-alerts
  rules:
  - alert: HighErrorRate
    expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "High error rate detected"
      description: "Error rate is above 10% for 2 minutes"

  - alert: HighResponseTime
    expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High response time"
      description: "95th percentile response time is above 2 seconds"

  - alert: ServiceDown
    expr: up{job="watt-app"} == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Watt application is down"
      description: "Watt application has been down for more than 1 minute"

Dashboard Examples

Key metrics to monitor:

Request Rate: Requests per second
Error Rate: Percentage of failed requests
Response Time: P50, P95, P99 response times
System Metrics: CPU usage, memory consumption
Database Performance: Connection pool, query duration
Service Health: Health check status across services

Log Analysis

Essential log queries:

# Error analysis
wattpm logs --level error | grep "database"

# Performance analysis  
wattpm logs | grep "duration" | jq '.responseTime'

# User behavior analysis
wattpm logs | grep "user_id" | jq '.userId'

Troubleshooting

Common monitoring issues and solutions:

Metrics Not Appearing

Check metrics endpoint: curl http://localhost:3042/metrics
Verify configuration: Ensure metrics are enabled
Check Prometheus scraping: Look for scraping errors

Missing Traces

Verify exporter configuration: Check OTLP endpoint
Check sampling rate: Ensure traces are being sampled
Network connectivity: Verify tracing backend is reachable

Health Check Failures

Review health check logic: Check custom health checks
Database connectivity: Verify database connections
External dependencies: Check third-party service health

For more troubleshooting help, see the Troubleshooting Guide.

Logging to Elasticsearch - ELK stack integration
Telemetry Configuration - OpenTelemetry setup
Production Deployment - Production deployment patterns

Overview​

Quick Setup​

Logging Configuration​

Basic Logging​

Structured Logging​

Log Levels​

Custom Logging in Services​

Metrics Collection​

Enable Metrics​

Built-in Metrics​

Custom Metrics​

Metrics Dashboard​

Distributed Tracing​

OpenTelemetry Setup​

Tracing in Services​

Jaeger Integration​

Health Checks​

Application Health​

Service-Specific Health Checks​

Database Health Checks​

Log Aggregation​

ELK Stack Integration​

Fluentd Integration​

Error Tracking​

Sentry Integration​

Production Monitoring Setup​

Kubernetes Deployment​

Docker Compose Monitoring Stack​

Monitoring Best Practices​

Alerting Rules​

Dashboard Examples​

Log Analysis​

Troubleshooting​

Metrics Not Appearing​

Missing Traces​

Health Check Failures​

Related Guides​