incident-response-incident-response
Orchestrate Incident Response Workflows
This skill provides a structured multi-phase incident response workflow for AI agents, enabling rapid detection, investigation, resolution, and postmortem documentation following modern SRE principles.
์คํฌ ZIP ๋ค์ด๋ก๋
Claude์์ ์ ๋ก๋
์ค์ โ ๊ธฐ๋ฅ โ ์คํฌ โ ์คํฌ ์ ๋ก๋๋ก ์ด๋
ํ ๊ธ์ ์ผ๊ณ ์ฌ์ฉ ์์
ํ ์คํธํด ๋ณด๊ธฐ
"incident-response-incident-response" ์ฌ์ฉ ์ค์ ๋๋ค. Use incident-response skill to triage: API service returning 500 errors for 30% of requests
์์ ๊ฒฐ๊ณผ:
Severity: P1/SEV-2 (Major degradation)
Affected Services: API Gateway, User Service
User Impact: 30% failed requests, primarily authenticated users
Initial Mitigation:
1. Enable circuit breaker for User Service
2. Check for recent deployments to roll back
3. Scale up API Gateway capacity
Incident Commander: [Assign]
Technical Lead: [Assign]
Communications Lead: [Assign]
"incident-response-incident-response" ์ฌ์ฉ ์ค์ ๋๋ค. Use incident-response skill to create postmortem for yesterday's database outage
์์ ๊ฒฐ๊ณผ:
## Blameless Postmortem - Database Outage
### Timeline
- 14:00 - Alert fired: Database CPU at 99%
- 14:05 - Incident declared P1
- 14:15 - Rollback attempted
- 14:30 - Root cause: Connection pool exhaustion
- 14:45 - Fix deployed
- 15:00 - Incident resolved
### Root Cause
Migration script created 10x normal connections
### What Went Well
- Fast detection (2 min)
- Clear communication
### Action Items
1. Add connection pool monitoring - Owner: Jane - Due: Feb 28
2. Update runbook for migrations - Owner: Bob - Due: Mar 1
๋ณด์ ๊ฐ์ฌ
์์ All 11 static findings are false positives. The skill is a legitimate incident response workflow guide (markdown documentation). The 'external_commands' detection refers to markdown backticks for file paths, not shell execution. The 'weak cryptographic algorithm' and 'system/network reconnaissance' detections are scanner misinterpretations of incident response terminology (severity levels, observability analysis, root cause analysis). No actual security risks present.
๋์ ์ํ ๋ฌธ์ (3)
ํ์ง ์ ์
๋ง๋ค ์ ์๋ ๊ฒ
SRE Team Lead managing production outage
Use the full workflow to coordinate team response, maintain incident command structure, and ensure proper communication during a sev-1 incident.
DevOps Engineer conducting post-incident review
Use Phase 5 (Postmortem & Prevention) to document incident timeline, identify root causes, and create action items for monitoring improvements.
On-call engineer performing initial triage
Use Phase 1 (Detection & Triage) to quickly classify incident severity, assess impact, and determine initial mitigation steps.
์ด ํ๋กฌํํธ๋ฅผ ์ฌ์ฉํด ๋ณด์ธ์
Use the incident-response skill to triage this alert: [DESCRIBE ALERT]. Determine severity level (P0-P3), identify affected services, assess user impact, and recommend initial mitigation actions.
Use the incident-response skill to investigate this incident: [INCIDENT DESCRIPTION]. Conduct deep debugging, security assessment, and performance analysis to identify root cause.
Use the incident-response skill to coordinate this emergency fix: [INCIDENT AND FIX DESCRIPTION]. Execute deployment with validation, monitoring, and rollback readiness.
Use the incident-response skill to conduct a blameless postmortem for: [INCIDENT SUMMARY]. Document timeline, root cause, what went well, what could improve, and create action items.
๋ชจ๋ฒ ์ฌ๋ก
- Assign clear incident commander and roles within the first 5 minutes of any P0/P1 incident
- Update stakeholder communication every 15-30 minutes during active incidents
- Complete blameless postmortem within 48 hours with specific, assignable action items
ํผํ๊ธฐ
- Skipping severity classification and jumping straight to debugging without understanding impact
- Blaming individuals in postmortems rather than focusing on system improvements
- Delaying communication to stakeholders until full resolution is achieved
์์ฃผ ๋ฌป๋ ์ง๋ฌธ
Does this skill execute actual incident response actions?
Can this skill replace my incident management platform?
What severity levels does this skill support?
How does this skill handle security incidents?
Can junior engineers use this skill effectively?
What makes this skill different from general debugging guides?
๊ฐ๋ฐ์ ์ธ๋ถ ์ ๋ณด
์์ฑ์
sickn33๋ผ์ด์ ์ค
MIT
๋ฆฌํฌ์งํ ๋ฆฌ
https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/incident-response-incident-response์ฐธ์กฐ
main
ํ์ผ ๊ตฌ์กฐ
๐ SKILL.md