AI Skill Library

Monitoring & Logging

Prometheus, Grafana, structured logging, error tracking, uptime monitoring.

monitoringloggingdevopsobservability
# Monitoring & Logging

## Structured logging (Pino - Node.js)
```bash
npm install pino pino-pretty
```
```ts
import pino from 'pino'
const logger = pino({
  level: process.env.LOG_LEVEL ?? 'info',
  transport: process.env.NODE_ENV !== 'production'
    ? { target: 'pino-pretty' } : undefined,
})

// Always log with context
logger.info({ userId, action: 'login', ip: req.ip }, 'User logged in')
logger.error({ err, requestId }, 'Database connection failed')

// Request logger middleware
app.use((req, res, next) => {
  req.log = logger.child({ requestId: crypto.randomUUID() })
  next()
})
```

## Prometheus metrics (Node.js)
```bash
npm install prom-client
```
```ts
import { register, Counter, Histogram, collectDefaultMetrics } from 'prom-client'
collectDefaultMetrics()  // CPU, memory, event loop lag

const httpRequests = new Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'route', 'status'],
})
const httpDuration = new Histogram({
  name: 'http_request_duration_ms',
  help: 'HTTP request duration',
  labelNames: ['method', 'route'],
  buckets: [10, 50, 100, 200, 500, 1000],
})

// Middleware
app.use((req, res, next) => {
  const end = httpDuration.startTimer({ method: req.method, route: req.path })
  res.on('finish', () => {
    httpRequests.inc({ method: req.method, route: req.path, status: res.statusCode })
    end()
  })
  next()
})

// Expose metrics endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType)
  res.send(await register.metrics())
})
```

## Prometheus + Grafana (Docker Compose)
```yaml
services:
  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports: ["9090:9090"]
  grafana:
    image: grafana/grafana
    ports: ["3001:3000"]
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin
```
```yaml
# prometheus.yml
scrape_configs:
  - job_name: api
    static_configs:
      - targets: ['api:3000']
    metrics_path: /metrics
    scrape_interval: 15s
```

## Error tracking (Sentry)
```bash
npm install @sentry/node @sentry/profiling-node
```
```ts
import * as Sentry from '@sentry/node'
Sentry.init({
  dsn: process.env.SENTRY_DSN,
  environment: process.env.NODE_ENV,
  tracesSampleRate: 0.1,
})
app.use(Sentry.Handlers.requestHandler())
app.use(Sentry.Handlers.errorHandler())  // must be before other error handlers
```

## Uptime monitoring
- **Betterstack / UptimeRobot**: ping endpoint every 1-5 min, alert on failure
- **Grafana Alerting**: alert rules on metrics (e.g. error rate > 1%)
- **Healthcheck cron**: `curl -fsS https://hc-ping.com/UUID` after each cron job

## Log aggregation (Loki + Grafana)
```yaml
# docker-compose
  loki:
    image: grafana/loki:2.9.0
    ports: ["3100:3100"]
  promtail:
    image: grafana/promtail:2.9.0
    volumes:
      - /var/log:/var/log
      - ./promtail.yml:/etc/promtail/config.yml
```
Query logs in Grafana with LogQL: `{job="api"} |= "error" | json | level="error"`

API: /api/skills/monitoring-logging