Monitoring & Logging
Prometheus, Grafana, structured logging, error tracking, uptime monitoring.
monitoringloggingdevopsobservability
# Monitoring & Logging
## Structured logging (Pino - Node.js)
```bash
npm install pino pino-pretty
```
```ts
import pino from 'pino'
const logger = pino({
level: process.env.LOG_LEVEL ?? 'info',
transport: process.env.NODE_ENV !== 'production'
? { target: 'pino-pretty' } : undefined,
})
// Always log with context
logger.info({ userId, action: 'login', ip: req.ip }, 'User logged in')
logger.error({ err, requestId }, 'Database connection failed')
// Request logger middleware
app.use((req, res, next) => {
req.log = logger.child({ requestId: crypto.randomUUID() })
next()
})
```
## Prometheus metrics (Node.js)
```bash
npm install prom-client
```
```ts
import { register, Counter, Histogram, collectDefaultMetrics } from 'prom-client'
collectDefaultMetrics() // CPU, memory, event loop lag
const httpRequests = new Counter({
name: 'http_requests_total',
help: 'Total HTTP requests',
labelNames: ['method', 'route', 'status'],
})
const httpDuration = new Histogram({
name: 'http_request_duration_ms',
help: 'HTTP request duration',
labelNames: ['method', 'route'],
buckets: [10, 50, 100, 200, 500, 1000],
})
// Middleware
app.use((req, res, next) => {
const end = httpDuration.startTimer({ method: req.method, route: req.path })
res.on('finish', () => {
httpRequests.inc({ method: req.method, route: req.path, status: res.statusCode })
end()
})
next()
})
// Expose metrics endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', register.contentType)
res.send(await register.metrics())
})
```
## Prometheus + Grafana (Docker Compose)
```yaml
services:
prometheus:
image: prom/prometheus
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports: ["9090:9090"]
grafana:
image: grafana/grafana
ports: ["3001:3000"]
environment:
GF_SECURITY_ADMIN_PASSWORD: admin
```
```yaml
# prometheus.yml
scrape_configs:
- job_name: api
static_configs:
- targets: ['api:3000']
metrics_path: /metrics
scrape_interval: 15s
```
## Error tracking (Sentry)
```bash
npm install @sentry/node @sentry/profiling-node
```
```ts
import * as Sentry from '@sentry/node'
Sentry.init({
dsn: process.env.SENTRY_DSN,
environment: process.env.NODE_ENV,
tracesSampleRate: 0.1,
})
app.use(Sentry.Handlers.requestHandler())
app.use(Sentry.Handlers.errorHandler()) // must be before other error handlers
```
## Uptime monitoring
- **Betterstack / UptimeRobot**: ping endpoint every 1-5 min, alert on failure
- **Grafana Alerting**: alert rules on metrics (e.g. error rate > 1%)
- **Healthcheck cron**: `curl -fsS https://hc-ping.com/UUID` after each cron job
## Log aggregation (Loki + Grafana)
```yaml
# docker-compose
loki:
image: grafana/loki:2.9.0
ports: ["3100:3100"]
promtail:
image: grafana/promtail:2.9.0
volumes:
- /var/log:/var/log
- ./promtail.yml:/etc/promtail/config.yml
```
Query logs in Grafana with LogQL: `{job="api"} |= "error" | json | level="error"`API: /api/skills/monitoring-logging