High

Network Degradation Remediation

High latency, packet loss, and connectivity failures are complex to diagnose. OnCallReady traces the network path, isolates the faulty segment, flushes stale routing state, and reroutes traffic to healthy endpoints automatically.

Avg Resolution
38s
Severity
High
Success Rate
91%
Humans Paged
0

Trigger Conditions

/network.*(degrad|latency|packet.loss|timeout|unreachable|high.ping|slow)/i

Fires on network performance alerts indicating degraded connectivity. Typical triggers: "Network latency 840ms (threshold: 200ms)", "Packet loss 18% on prod-vpc", "Service unreachable from us-east-2", "High error rate on external API calls".

What the Agent Does

1

Run network diagnostics

Executes traceroute, ping, and MTR from affected host to key destinations. Maps where in the path latency or loss begins. Determines if it's internal (VPC routing) or external (ISP/CDN).

2

Flush stale ARP and routing caches

Clears ARP tables and routing cache on affected hosts. Resolves stale routes that can cause traffic black-holing after network topology changes or instance replacements.

3

Restart network interface if needed

If the NIC shows errors or the route table is corrupt, performs a controlled interface bounce. Uses the management network if available to maintain connectivity during the operation.

4

Reroute traffic to healthy endpoints

If degradation is upstream (CDN edge, transit provider), updates load balancer backend weights to route away from the affected AZ or region. Traffic shifts in under 5 seconds.

5

Verify connectivity and latency

Confirms round-trip time returns to baseline (<50ms internal, <200ms external). Verifies packet loss is below 0.1%. Marks resolved or escalates with full traceroute output for ISP escalation.

Example Incident Log

incident-5602 · network-degradation · prod-api-cluster
[07:18:44] ALERT Network latency 920ms on prod-api-cluster (threshold 200ms)
[07:18:44] Matched runbook: network-degradation
[07:18:45] Running MTR from api-node-1 → 10.0.2.1 (db subnet)...
[07:18:51] Hop 3 (10.0.1.254): 74% packet loss · Stale ARP entry suspected
[07:18:52] Flushing ARP table on api-node-1
[07:18:53] ARP cache flushed · Re-resolving 10.0.1.254
[07:18:55] Flushing kernel routing cache
[07:18:56] Routing cache flushed
[07:18:58] Re-running MTR... Hop 3: 0% loss · RTT 4ms
[07:19:02] Monitoring latency... 18ms → 12ms → 8ms
[07:19:22] ✓ RESOLVED Latency: 920ms → 9ms · Packet loss: 0% · Duration: 38s
[07:19:22] On-call: undisturbed. Stale ARP cleared automatically.

Network issues resolved before users notice

OnCallReady diagnoses and fixes network degradation in seconds. See it demonstrated live.