OnCallReady registers as an Alertmanager receiver, intercepting Prometheus alerts before they reach PagerDuty — executing the fix autonomously and posting a silence or annotation back so your dashboards stay accurate and your phone stays quiet.
Alertmanager routes matching alerts to OnCallReady's webhook receiver. OnCallReady resolves the incident, then calls the Alertmanager API to create a silence so the alert doesn't re-fire during remediation.
| Alertmanager alert name | Runbook triggered | Autonomous action |
|---|---|---|
| NodeDiskSpaceRunningLow | Disk Full Remediation | Purge logs & tmp, verify free space, post silence for 2h |
| NodeMemoryUsageHigh | Memory Exhaustion | Drop page cache, restart OOM candidates, confirm stable |
| HighCPULoad | CPU Spike | Identify offending process, throttle or kill, scale if needed |
| PostgresConnectionsNearMax | DB Connection Pool | Terminate idle connections, rolling restart of pool manager |
| ServiceDown | Service Restart & Recovery | Drain from LB, restart with health check gate, re-add to rotation |
| TLSCertExpiringSoon | SSL Certificate Renewal | ACME renewal, cert deploy, web server reload |
Add OnCallReady as a receiver in your Alertmanager config. The route below sends all non-critical alerts to OnCallReady first, escalating only if the resolution fails.
OnCallReady handles the automatable incidents. These still page a human:
severity: critical — route these to your existing escalation pathMost Prometheus teams also use Grafana for dashboards. Connect both:
Most teams resolve 60–80% of alerts automatically within the first week.