Available now

Grafana + OnCallReady: AI Incident Response for Grafana Alerts

OnCallReady plugs in as a Grafana unified contact point — sitting between your dashboards and your on-call rotation, triaging every alert with AI and autonomously resolving the incidents that don't need a human before any notification fires.

How it works

Grafana's alerting engine evaluates rules and fires webhooks. Configure OnCallReady as the first contact point in your notification policy. We resolve what we can and only pass through what genuinely needs escalation.

┌─────────────────────────────────────────────────────────────────┐ │ Grafana Alert Rule fires (every 1m evaluation) │ └─────────────────────────────────────┬───────────────────────────┘ │ webhook POST ▼ ┌─────────────────────────────────────────────────────────────────┐ │ OnCallReady (contact point) │ │ AI triage → runbook match → execute fix → verify health │ │ │ │ ✓ Resolved → annotate dashboard + post to Slack │ │ ✗ Unresolved → forward to next contact point (PagerDuty/email) │ └─────────────────────────────────────────────────────────────────┘

Signal → Action table

Grafana alert	Runbook triggered	Autonomous action
node_filesystem_avail_bytes < 10%	Disk Full Remediation	Purge logs, temp files; annotate panel with resolution timestamp
node_memory_MemAvailable_bytes < 5%	Memory Exhaustion	Drop caches, restart leaking service, confirm memory normalizes
container_cpu_usage_seconds_total spike	CPU Spike	Identify container, throttle or scale, verify CPU drop
kube_pod_status_phase != Running	Service Restart & Recovery	Describe pod, capture logs, force restart, confirm Running state
probe_ssl_earliest_cert_expiry < 14 days	SSL Certificate Renewal	ACME renewal, deploy new cert, web server reload
Custom threshold on any Grafana panel	Pattern-matched runbook	Regex match on alert title → nearest runbook or escalate

Setup in 3 minutes

In Grafana: Alerting → Contact points → Add contact point. Choose Webhook and paste the URL below. Then add it to your notification policy before your existing PagerDuty or email contact.

Grafana contact point config

# Grafana → Alerting → Contact points → New contact point Name: oncallready Type: Webhook URL: https://oncallready.polsia.app/api/alerts # Optional: pass your API key in the header Headers: Authorization: Bearer YOUR_API_KEY # Notification policy (Alerting → Notification policies) # Add oncallready BEFORE your PagerDuty/email contact: - contact_point: oncallready continue: false # set true to always notify next contact too # OnCallReady resolves and annotates the originating panel. # If resolution fails, it passes control to your next policy.

What stays on-call

AI triage handles the majority. These still escalate to a human:

Grafana OnCall escalation chains you've marked critical — bypass OnCallReady entirely
Dashboard panels with annotation human-required: true in the alert message
Cascading multi-panel alerts suggesting a broader infrastructure event
Any action that would modify a Grafana data source configuration
Runbook execution failures — escalated with full Grafana annotation context so the responder has complete history

Grafana usually sits on top of Prometheus. Connect both ends of the pipeline:

Prometheus → Slack → PagerDuty → Runbook library → Live demo →

Your Grafana alerts deserve better than a 3 AM page

Connect in 3 minutes. Free plan, no credit card required.

View pricing → Watch live demo