When memory usage spikes to dangerous levels, OnCallReady profiles running processes, drops reclaimable caches, selectively restarts memory-leaking services, and confirms system stability — no pager needed.
Triggers on memory-related alert keywords combined with severity signals. Handles OOM killer events, RSS percentage thresholds, swap exhaustion, and memory pressure warnings from any monitoring tool. Typical: "Memory usage 94% on api-server-2", "OOM killer triggered", "High memory pressure detected".
Lists top processes by RSS and VSZ. Identifies processes that have grown beyond their baseline allocation. Flags potential memory leaks based on rate-of-growth over the past 30 minutes.
Triggers kernel page cache drop (sync; echo 3 > /proc/sys/vm/drop_caches) to immediately reclaim reclaimable memory. Safe — no application data is lost.
Gracefully restarts services consuming memory above their configured ceiling. Performs rolling restart if multiple instances exist to maintain availability during remediation.
Polls health endpoints for restarted services. Confirms they are serving traffic within 10 seconds. Waits for memory metrics to stabilize below 75%.
If memory is back under control, resolves the incident with full remediation detail. If memory remains critical after restart, escalates immediately with process profile and restart logs.
OnCallReady handles memory exhaustion before it becomes an outage. See a live resolution.