Alert Example
This page provides a real example of an alert, what triggers it, and how it flows through the system.
Pod Status Alert
What It Detects
Kubernetes pods that are running but not ready, indicating potential application issues.
Alert Rule Configuration
- Field Monitored:
pod.status_phase,pod.status_ready, andcluster - Condition:
pod.status_phase = "running" AND pod.status_ready = false - Threshold: Any pod in this state for more than 5 minutes
- Time Window: Last 15 minutes
- Additional Context: Include
pod.name,pod.namespace,pod.application_name, anddomain_namein the alert payload
Alert Payload Structure
{
"domain_name": "your-domain.example.com",
"t1_end": "2024-11-07T14:30:00.000Z",
"t2_end": "2024-11-07T14:45:00.000Z",
"cluster": "your-cluster-name",
"pod": [
{
"name": "your-app-0",
"namespace": "your-namespace",
"node_name": "ip-10-0-1-100.region.compute.internal",
"status_phase": "running",
"status_scheduled": true,
"status_ready": false,
"application_name": "your-application",
"region": "your-region",
"aws_account_id": 123456789012,
"timestamp": "2024-11-07 14:30:00.000000+00:00"
}
]
}
Alert Flow
- Detection: Pod "your-app-0" is running but not ready for 5+ minutes
- Alert Generation: Central Monitoring creates and routes the payload shown above to EventOps
- EventOps Processing: Alert routed to application team based on namespace/domain
- Notification: Email sent to your team with pod and cluster details in the above alert payload
- Response: Team checks pod logs and application health in the specific namespace