Available now

Datadog + OnCallReady: AI Incident Response for Datadog Alerts

OnCallReady connects to Datadog monitor webhooks to intercept alerts before they page anyone — matching each signal to an automated runbook and posting the resolution back to your Datadog event timeline in under 30 seconds.

How it works

The integration is a single webhook. No Datadog agent changes, no infrastructure to maintain. Datadog fires the webhook on alert; OnCallReady resolves and posts back.

┌─────────────────────────┐ ┌──────────────────────────────┐ Datadog Monitor │ │ OnCallReady disk usage > 90% │ │ status: ALERT │ │ ① Pattern match │──POST──▶│ ② Select runbook webhook_url: │ │ ③ Execute actions (28s avg) oncallready.../alerts │ │ ④ Verify health └─────────────────────────┘ │ ⑤ Post result back └──────────────┬───────────────┘ │ resolved + timeline ┌──────────────▼───────────────┐ Datadog Event Timeline + Slack #incidents └──────────────────────────────┘

Signal → Action table

Datadog alert type Runbook triggered Autonomous action
disk.in_use > 90% Disk Full Remediation Identify top consumers, rotate logs, purge temp files, verify volume
system.mem.pct_usable < 10% Memory Exhaustion Profile consumers, drop reclaimable caches, restart leaking services
system.cpu.user > 95% CPU Spike Find runaway process, throttle or terminate, scale horizontally if load is legit
ssl.days_until_expiry < 14 SSL Certificate Renewal ACME renewal, deploy new cert, reload web server — zero downtime
postgresql.connections > pool_max DB Connection Pool Kill idle connections, identify leaking service, rolling restart, rebalance
rabbitmq.queue.messages > threshold Queue Backlog Purge poison messages, restart workers, scale processors

Setup in 3 minutes

Add the webhook URL to any Datadog monitor. No Datadog API key required on our end — the payload Datadog sends contains everything we need.

Datadog monitor webhook config
# In Datadog: Monitors → [Your Monitor] → Notify your team # Add a webhook integration with this URL: Webhook URL: https://oncallready.polsia.app/api/alerts # Payload (use Datadog's default or customize): { "title": "$EVENT_TITLE", "severity": "$ALERT_TYPE", "source": "datadog", "monitor_id": "$ID", "host": "$HOSTNAME", "metric": "$METRIC_NAMESPACE", "value": "$ALERT_METRIC" } # That's it. OnCallReady starts auto-resolving immediately. # Resolution results post back to the Datadog event timeline.

What stays on-call

OnCallReady handles the automatable majority. These scenarios still escalate to a human:

Related

OnCallReady works across your full stack. Pair the Datadog integration with:

Stop getting paged for disk alerts

Connect Datadog in 3 minutes. Free plan resolves up to 50 incidents/month.