incident-response-incident-response
Orchestrate Incident Response Workflows
This skill provides a structured multi-phase incident response workflow for AI agents, enabling rapid detection, investigation, resolution, and postmortem documentation following modern SRE principles.
スキルZIPをダウンロード
Claudeでアップロード
設定 → 機能 → スキル → スキルをアップロードへ移動
オンにして利用開始
テストする
「incident-response-incident-response」を使用しています。 Use incident-response skill to triage: API service returning 500 errors for 30% of requests
期待される結果:
Severity: P1/SEV-2 (Major degradation)
Affected Services: API Gateway, User Service
User Impact: 30% failed requests, primarily authenticated users
Initial Mitigation:
1. Enable circuit breaker for User Service
2. Check for recent deployments to roll back
3. Scale up API Gateway capacity
Incident Commander: [Assign]
Technical Lead: [Assign]
Communications Lead: [Assign]
「incident-response-incident-response」を使用しています。 Use incident-response skill to create postmortem for yesterday's database outage
期待される結果:
## Blameless Postmortem - Database Outage
### Timeline
- 14:00 - Alert fired: Database CPU at 99%
- 14:05 - Incident declared P1
- 14:15 - Rollback attempted
- 14:30 - Root cause: Connection pool exhaustion
- 14:45 - Fix deployed
- 15:00 - Incident resolved
### Root Cause
Migration script created 10x normal connections
### What Went Well
- Fast detection (2 min)
- Clear communication
### Action Items
1. Add connection pool monitoring - Owner: Jane - Due: Feb 28
2. Update runbook for migrations - Owner: Bob - Due: Mar 1
セキュリティ監査
安全All 11 static findings are false positives. The skill is a legitimate incident response workflow guide (markdown documentation). The 'external_commands' detection refers to markdown backticks for file paths, not shell execution. The 'weak cryptographic algorithm' and 'system/network reconnaissance' detections are scanner misinterpretations of incident response terminology (severity levels, observability analysis, root cause analysis). No actual security risks present.
高リスクの問題 (3)
品質スコア
作れるもの
SRE Team Lead managing production outage
Use the full workflow to coordinate team response, maintain incident command structure, and ensure proper communication during a sev-1 incident.
DevOps Engineer conducting post-incident review
Use Phase 5 (Postmortem & Prevention) to document incident timeline, identify root causes, and create action items for monitoring improvements.
On-call engineer performing initial triage
Use Phase 1 (Detection & Triage) to quickly classify incident severity, assess impact, and determine initial mitigation steps.
これらのプロンプトを試す
Use the incident-response skill to triage this alert: [DESCRIBE ALERT]. Determine severity level (P0-P3), identify affected services, assess user impact, and recommend initial mitigation actions.
Use the incident-response skill to investigate this incident: [INCIDENT DESCRIPTION]. Conduct deep debugging, security assessment, and performance analysis to identify root cause.
Use the incident-response skill to coordinate this emergency fix: [INCIDENT AND FIX DESCRIPTION]. Execute deployment with validation, monitoring, and rollback readiness.
Use the incident-response skill to conduct a blameless postmortem for: [INCIDENT SUMMARY]. Document timeline, root cause, what went well, what could improve, and create action items.
ベストプラクティス
- Assign clear incident commander and roles within the first 5 minutes of any P0/P1 incident
- Update stakeholder communication every 15-30 minutes during active incidents
- Complete blameless postmortem within 48 hours with specific, assignable action items
回避
- Skipping severity classification and jumping straight to debugging without understanding impact
- Blaming individuals in postmortems rather than focusing on system improvements
- Delaying communication to stakeholders until full resolution is achieved