- Seconds notify - Stop escalation - Mobilize fast - Communicate proactive

- Faster fixes - Fewer users hit - Trust preserved - Losses lowered

- Always available - Seamless - Perf maintained - Confidence built

- Solve focused - Less stress - Better balance - Data decisions ## Alert Types

Group: ```javascript const grouping = { groupingWindow: 300000, // 5 min shouldGroup: (newAlert, groups) => { return groups.find(group => group.service === newAlert.service && group.alertType === newAlert.alertType && (Date.now() - group.lastAlert) ({ title: `${alerts.length} alerts for ${alerts[0].service}`, summary: alerts.map(a => a.message).join('\n'), severity: Math.max(...alerts.map(a => a.severity)), actions: ['View', 'Ack All', 'Escalate'] }) }; ``` Escalation: 0 min primary, 5 min secondary, 15 min manager, 30 min exec.

Business-aware: ```python import datetime class AwareAlerting: def __init__(self): self.business_hours = { 'start': 9, 'end': 17, 'timezone': 'UTC', 'weekdays_only': True } def adjust_severity(self, base_severity, timestamp): dt = datetime.datetime.fromtimestamp(timestamp) if self.is_business_hours(dt): severity_map = { 'low': 'medium', 'medium': 'high', 'high': 'critical' } return severity_map.get(base_severity, base_severity) elif base_severity in ['low', 'medium']: return base_severity # Delay escalation return base_severity ``` ## Conclusion Real-time necessary. Implement smart. exit1.dev foundation: 1-min checks, global, intelligent alerts. With config, escalation, improvement, first defense. Goal: Right alerts, right time, right info. Reduces stress, boosts efficiency, keeps sites running. *Try exit1.dev [here](https://exit1.dev). Alerts that help, not overwhelm.* ## Sources - Google SRE Book: Monitoring Distributed Systems — https://sre.google/sre-book/monitoring-distributed-systems/ - AWS Well-Architected: Reliability Pillar — https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/welcome.html ## Recommended Free Monitoring Resources - [Free Uptime Monitor Checklist](/blog/free-uptime-monitor-checklist) – Step-by-step actions to configure a free uptime monitor that catches incidents fast. - [Best Free Uptime Monitoring Tools (2025)](/blog/best-free-uptime-monitoring-tools) – Compare the strongest free uptime monitor platforms and when to upgrade. - [Free Website Monitoring Tools 2025 Guide](/blog/free-website-monitoring-tools-2025) – Evaluate which free website monitor fits your stack and alerting needs. - [Free Website Monitoring for Developers](/blog/free-website-monitoring-for-developers) – See how engineering teams automate alerts, SLO tracking, and reporting with a free website monitor.

Real-Time Alerts: Don't Delay

Q: Availability

HTTP: ```javascript const availability = { trigger: { statusCode: [500, 502, 503, 504], consecutiveFailures: 1, timeout: 30000 }, notification: { channels: ['slack', 'email', 'sms'], severity: 'critical', escalation: { afterMinutes: 5, toTeam: 'on-call' } } }; ``` DNS/connectivity, SSL, CDN.

Q: Performance

Response: ```yaml response_time_alerts: warning_threshold: 2000ms critical_threshold: 5000ms measurement_window: 3 actions: warning: - notify_chat - log_issue critical: - page_engineer - auto_scale - notify_stakeholders ``` Resource: CPU, memory, disk, DB pools.

Q: Channels

Multi: ```python class Notifier: def __init__(self): self.channels = { 'slack': SlackNotifier(), 'discord': DiscordNotifier(), 'email': EmailNotifier(), 'sms': SMSNotifier(), 'webhook': WebhookNotifier() } def send(self, alert, severity='medium'): if severity == 'critical': for channel in self.channels.values(): channel.send_immediate(alert) elif severity == 'high': self.channels['slack'].send(alert) self.channels['email'].send(alert) self.channels['webhook'].send(alert) elif severity == 'medium': self.channels['slack'].send(alert) self.channels['email'].send(alert) else: self.channels['slack'].send(alert) ``` Team routing: Time, type, expertise, escalation.

Q: Content

Actionable: ```json { "alert": { "title": "API Down - Payments", "severity": "critical", "timestamp": "2024-01-15T14:30:00Z", "service": { "name": "Payment API", "url": "https://api.example.com/payments", "environment": "prod" }, "issue": { "status_code": 500, "response_time": "timeout", "error_message": "Server Error", "affected_users": "150+" }, "context": { "recent_deployments": "v2.1.3 2 hours ago", "traffic_pattern": "normal", "dependencies": ["db", "redis", "gateway"] }, "suggested_actions": [ "Check logs", "Verify db", "Rollback v2.1.2", "Contact gateway" ], "runbook": "https://docs.company.com/runbooks/payment-api-issues", "dashboard": "https://monitor.company.com/payment-api" } } ``` ## Quickstart exit1.dev exit1 add https://api.myapp.com/health \ --name "API Health" \ --interval 60 \ --timeout 30 \ --expected-status 200 \ --alert-on-failure \ --alert-on-recovery exit1 alert add-channel slack \ --webhook-url "https://hooks.slack.com/..." \ --channel "#alerts" \ --severity critical,high exit1 alert add-channel discord \ --webhook-url "https://discord.com/api/webhooks/..." \ --severity critical exit1 alert add-channel email \ --addresses "team@company.com,oncall@company.com" \ --severity high,medium ## Advanced

Delays cost. Real-time fixes quick.

Why Immediate

E-comm: Gateway fail 2PM, 5-min detect 2:05, fix 2:25 (25 min loss), $50k gone. 1-min: Detect 2:01, fix 2:21 (21 min), saves $40k.

SaaS: DB issue 10:30, 5-min detect 10:35, restart 10:37 (7 min), 200 tickets. 1-min: 10:31 detect, 10:33 restart (3 min), 80 tickets.

API: 500s 8 min undetected, cascades. Real-time stops it.

Benefits

Quick Response

Seconds notify
Stop escalation
Mobilize fast
Communicate proactive

Less Downtime

Faster fixes
Fewer users hit
Trust preserved
Losses lowered

Better UX

Always available
Seamless
Perf maintained
Confidence built

Team Boost

Solve focused
Less stress
Better balance
Data decisions

Alert Types

Availability

HTTP:

const availability = {
  trigger: {
    statusCode: [500, 502, 503, 504],
    consecutiveFailures: 1,
    timeout: 30000
  },
  notification: {
    channels: ['slack', 'email', 'sms'],
    severity: 'critical',
    escalation: {
      afterMinutes: 5,
      toTeam: 'on-call'
    }
  }
};

DNS/connectivity, SSL, CDN.

Performance

Response:

response_time_alerts:
  warning_threshold: 2000ms
  critical_threshold: 5000ms
  measurement_window: 3

  actions:
    warning:
      - notify_chat
      - log_issue
    critical:
      - page_engineer
      - auto_scale
      - notify_stakeholders

Resource: CPU, memory, disk, DB pools.

Biz Logic

Journeys: Reg, payment, search, APIs.

Custom: Cart abandon, login success, uploads, integrations.

Setup

Channels

Multi:

class Notifier:
    def __init__(self):
        self.channels = {
            'slack': SlackNotifier(),
            'discord': DiscordNotifier(),
            'email': EmailNotifier(),
            'sms': SMSNotifier(),
            'webhook': WebhookNotifier()
        }
    
    def send(self, alert, severity='medium'):
        if severity == 'critical':
            for channel in self.channels.values():
                channel.send_immediate(alert)
        
        elif severity == 'high':
            self.channels['slack'].send(alert)
            self.channels['email'].send(alert)
            self.channels['webhook'].send(alert)
        
        elif severity == 'medium':
            self.channels['slack'].send(alert)
            self.channels['email'].send(alert)
        
        else:
            self.channels['slack'].send(alert)

Team routing: Time, type, expertise, escalation.

Frequency/Timing

Group:

const grouping = {
  groupingWindow: 300000, // 5 min
  
  shouldGroup: (newAlert, groups) => {
    return groups.find(group => 
      group.service === newAlert.service &&
      group.alertType === newAlert.alertType &&
      (Date.now() - group.lastAlert) < groupingWindow
    );
  },
  
  createMessage: (alerts) => ({
    title: `${alerts.length} alerts for ${alerts[0].service}`,
    summary: alerts.map(a => a.message).join('\n'),
    severity: Math.max(...alerts.map(a => a.severity)),
    actions: ['View', 'Ack All', 'Escalate']
  })
};

Escalation: 0 min primary, 5 min secondary, 15 min manager, 30 min exec.

Content

Actionable:

{
  "alert": {
    "title": "API Down - Payments",
    "severity": "critical",
    "timestamp": "2024-01-15T14:30:00Z",
    "service": {
      "name": "Payment API",
      "url": "https://api.example.com/payments",
      "environment": "prod"
    },
    "issue": {
      "status_code": 500,
      "response_time": "timeout",
      "error_message": "Server Error",
      "affected_users": "150+"
    },
    "context": {
      "recent_deployments": "v2.1.3 2 hours ago",
      "traffic_pattern": "normal",
      "dependencies": ["db", "redis", "gateway"]
    },
    "suggested_actions": [
      "Check logs",
      "Verify db",
      "Rollback v2.1.2",
      "Contact gateway"
    ],
    "runbook": "https://docs.company.com/runbooks/payment-api-issues",
    "dashboard": "https://monitor.company.com/payment-api"
  }
}

Quickstart exit1.dev

exit1 add https://api.myapp.com/health
--name "API Health"
--interval 60
--timeout 30
--expected-status 200
--alert-on-failure
--alert-on-recovery

exit1 alert add-channel slack
--webhook-url "https://hooks.slack.com/..."
--channel "#alerts"
--severity critical,high

exit1 alert add-channel discord
--webhook-url "https://discord.com/api/webhooks/..."
--severity critical

exit1 alert add-channel email
--addresses "team@company.com,oncall@company.com"
--severity high,medium

Advanced

Predictive

Trend:

const predictive = {
  metric: 'response_time',
  analysis_window: '15_minutes',
  prediction_window: '5_minutes',
  
  trigger: {
    trend_direction: 'increasing',
    trend_slope: 0.2,
    confidence_threshold: 0.8
  },
  
  action: {
    message: "Response time trending up - issue brewing",
    severity: 'warning',
    suggested_actions: [
      'Check resources',
      'Review deploys',
      'Monitor spikes'
    ]
  }
};

Context Alerting

Business-aware:

import datetime

class AwareAlerting:
    def __init__(self):
        self.business_hours = {
            'start': 9,
            'end': 17,
            'timezone': 'UTC',
            'weekdays_only': True
        }
    
    def adjust_severity(self, base_severity, timestamp):
        dt = datetime.datetime.fromtimestamp(timestamp)
        
        if self.is_business_hours(dt):
            severity_map = {
                'low': 'medium',
                'medium': 'high',
                'high': 'critical'
            }
            return severity_map.get(base_severity, base_severity)
        
        elif base_severity in ['low', 'medium']:
            return base_severity  # Delay escalation
        
        return base_severity

Conclusion

Real-time necessary. Implement smart.

exit1.dev foundation: 1-min checks, global, intelligent alerts. With config, escalation, improvement, first defense.

Goal: Right alerts, right time, right info. Reduces stress, boosts efficiency, keeps sites running.

Try exit1.dev here. Alerts that help, not overwhelm.

Sources

Google SRE Book: Monitoring Distributed Systems — https://sre.google/sre-book/monitoring-distributed-systems/
AWS Well-Architected: Reliability Pillar — https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/welcome.html

Recommended Free Monitoring Resources

Free Uptime Monitor Checklist – Step-by-step actions to configure a free uptime monitor that catches incidents fast.
Best Free Uptime Monitoring Tools (2025) – Compare the strongest free uptime monitor platforms and when to upgrade.
Free Website Monitoring Tools 2025 Guide – Evaluate which free website monitor fits your stack and alerting needs.
Free Website Monitoring for Developers – See how engineering teams automate alerts, SLO tracking, and reporting with a free website monitor.

Morten Pradsgaard is the founder of exit1.dev — the free uptime monitor for people who actually ship. He writes no-bullshit guides on monitoring, reliability, and building software that doesn't crumble under pressure.

Real-Time Alerts: Speed or Bleed Money