Skip to main content

Alert Example

This page provides a real example of an alert, what triggers it, and how it flows through the system.


Pod Status Alert

What It Detects

Kubernetes pods that are running but not ready, indicating potential application issues.

Alert Rule Configuration

  • Field Monitored: pod.status_phase, pod.status_ready, and cluster
  • Condition: pod.status_phase = "running" AND pod.status_ready = false
  • Threshold: Any pod in this state for more than 5 minutes
  • Time Window: Last 15 minutes
  • Additional Context: Include pod.name, pod.namespace, pod.application_name, and domain_name in the alert payload

Alert Payload Structure

{
"domain_name": "your-domain.example.com",
"t1_end": "2024-11-07T14:30:00.000Z",
"t2_end": "2024-11-07T14:45:00.000Z",
"cluster": "your-cluster-name",
"pod": [
{
"name": "your-app-0",
"namespace": "your-namespace",
"node_name": "ip-10-0-1-100.region.compute.internal",
"status_phase": "running",
"status_scheduled": true,
"status_ready": false,
"application_name": "your-application",
"region": "your-region",
"aws_account_id": 123456789012,
"timestamp": "2024-11-07 14:30:00.000000+00:00"
}
]
}

Alert Flow

  1. Detection: Pod "your-app-0" is running but not ready for 5+ minutes
  2. Alert Generation: Central Monitoring creates and routes the payload shown above to EventOps
  3. EventOps Processing: Alert routed to application team based on namespace/domain
  4. Notification: Email sent to your team with pod and cluster details in the above alert payload
  5. Response: Team checks pod logs and application health in the specific namespace