스킬 observability-monitoring-slo-implement

📊

observability-monitoring-slo-implement

Name: observability-monitoring-slo-implement
Author: sickn33

안전

SLO 및 오류 예산 구현

SLI 및 오류 예산과 함께 서비스 수준 목표를 설계 및 구현하여 시스템 안정성을 측정 및 개선하고, 기능 개발 속도와 균형을 맞추세요.

지원: Claude Codex Code(CC)

⚠️ 68 나쁨

스킬 ZIP 다운로드

Claude에서 업로드

설정 → 기능 → 스킬 → 스킬 업로드로 이동

토글을 켜고 사용 시작

테스트해 보기

"observability-monitoring-slo-implement" 사용 중입니다. 새로운 전자상거래 체크아웃 서비스에 대한 SLO 설계

예상 결과:

tier 분류(중요), 가용성 목표(99.95%), 지연 시간 SLI(p95 < 500ms), 오류율 SLI(< 0.1%), 오류 예산 계산(월 4.38시간), 소각率 경고 임계값을 포함한 종합적인 SLO 프레임워크

"observability-monitoring-slo-implement" 사용 중입니다. SLO 추적을 위한 Prometheus 녹화 규칙 생성

예상 결과:

요청율, 다중 시간 창(5분, 30분, 1시간)의 성공율, 지연 시간 백분위수(p50, p95, p99), 오류 예산 소각率 계산을 위한 녹화 규칙이 포함된 YAML 구성

보안 감사

안전

v1 • 2/24/2026

Static analysis detected 57 potential issues, but manual review confirms all findings are false positives. The skill contains documentation with Python code examples for SLO implementation - no actual executable code, no network calls, and no cryptographic operations. The placeholder URLs use example.com domain. This is a legitimate DevOps reliability skill.

스캔된 파일

1,124

분석된 줄 수

발견 사항

총 감사 수

중간 위험 문제 (2)

resources/implementation-playbook.md:40 resources/implementation-playbook.md:154-161 SKILL.md:36 SKILL.md:45

External Commands Detection in Documentation

Static scanner detected 'external_commands' pattern in markdown documentation. This is a false positive - the skill contains Python code examples in markdown blocks, not executable shell commands. The backtick syntax detected is part of Python f-strings and dictionary literals in documentation examples.

resources/implementation-playbook.md:969 resources/implementation-playbook.md:970

Hardcoded URLs in Example Configuration

Static scanner detected placeholder URLs in YAML configuration examples. These are example.com domain URLs used as placeholders in documentation, not actual network endpoints.

낮은 위험 문제 (3)

resources/implementation-playbook.md:7 resources/implementation-playbook.md:39 SKILL.md:3

Numeric Pattern False Positives

Static scanner detected 'weak cryptographic algorithm' patterns at multiple locations. These are false positives - the numeric values detected (99.9%, 0.001, 14.4) are SLO availability targets and burn rate multipliers, not cryptographic algorithms.

resources/implementation-playbook.md:24 SKILL.md:40

Documentation Language False Positive

Static scanner detected 'system reconnaissance' patterns. This is a false positive - words like 'analyze', 'assess', 'identify' are used in the legitimate context of service analysis for SLO design, not reconnaissance.

resources/implementation-playbook.md:1

Code Block Bracket Pattern

Static scanner detected 'obfuscation' pattern with multiple bracket chains. This is a false positive - the pattern detected is legitimate markdown code block formatting with Python dictionary and f-string syntax.

감사자: claude

품질 점수

아키텍처

100

유지보수성

콘텐츠

커뮤니티

보안

사양 준수

만들 수 있는 것

새로운 API 서비스에 대한 SLO 정의

서비스 중요도에 따라 적절한 대상을 포함한 가용성, 지연 시간 및 오류율 SLO 생성

오류 예산 경고 설정

빠른 및 느린 오류 예산 소각을 감지하기 위한 다중 창 소각率 경고 구성

SLO 검토 프로세스 수립

엔지니어링 팀을 위한 주간 SLO 검토 템플릿 및 거버넌스 프로세스 생성

이 프롬프트를 사용해 보세요

기본 SLO 설계

결제 처리 서비스를 위한 SLO를 설계 도와주세요. 분당 10,000개의 요청을 처리하며 높은 안정성이 필요합니다. 어떤 가용성 목표를 설정해야 하며 SLI를 어떻게 정의해야 하나요?

SLI 구현

Prometheus를 사용하는 REST API 서비스에 대한 SLI를 구현해야 합니다. 성공적인 요청의百分比와 500ms 미만의 요청을 추적하는 가용성 및 지연 시간 SLI 쿼리를 생성하는 방법을 보여주세요.

오류 예산 경고

99.9% SLO 대상을 가진 서비스에 대한 오류 예산 소각率 경고를 구성해야 합니다. 빠른 소각(즉시 페이지)과 느린 소각(티켓 생성) 경고 규칙이 모두 필요합니다.

SLO 거버넌스

역할 및 책임, 주간 검토 템플릿 및 이해관계자 comunication 프로세스를 포함한 팀을 위한 SLO 거버넌스 프레임워크를 수립해주세요.

모범 사례

保守적인 SLO 대상에서 시작하고 실제 서비스 performance 데이터에 따라 강화
빠른 및 느린 예산的消费를 감지하기 위해 다중 시간 창 소각率 경고 사용
기술적 편의성이 아닌 business 우선순위 및 사용자 기대에 SLO 대상을 맞춤

피하기

과도하게tight한 SLO 대상을 처음부터 설정하여 지속적인 경고 및 경고 피로 발생
지연 시간 또는 품질 지표를 고려하지 않고 가용성 SLI만 사용
이해관계자 조정 또는 business 맥락 없이 SLO 생성

자주 묻는 질문

SLO와 SLA의 차이점은 무엇인가요?

SLO(서비스 수준 목표)는 엔지니어링 팀이 약속하는 내부 대상입니다. SLA(서비스 수준 계약)은 위반 시 재정적后果이 있는 고객에 대한 계약적 약속입니다.

올바른 SLO 가용성 대상을 어떻게 선택하나요?

과거 가용성을 분석하고, 사용자 기대를 이해하며, business 영향을 고려하여 시작하세요. 중요 서비스는 일반적으로 99.95%+가 필요하며 표준 서비스는 99.5%를 대상할 수 있습니다.

SLO 측정에 어떤 시간 창을 사용해야 하나요?

일반적인 창은 롤링 가용성의 경우 30일 또는 청구 기간의 경우 달력 월입니다. 더 긴 창은 안정성을 제공하지만 문제에 대한 피드백이 느립니다.

SLO 계산에서 예약된 유지보수를 어떻게 처리하나요?

예상된 downtime을 설명하는 가용성 공식을 사용하거나 SLO 측정에서 계획된 유지보수 창을 제외하세요. 접근 방식을 명확히 문서화하세요.

오류 예산이 고갈되면 어떻게 해야 하나요?

서비스에 몇 개의 SLO가 있어야 하나요?

기능 개발을 일시 중단하고, 안정성 개선에 집중하며, 이해관계자에게 상태를 comunicate하세요. 릴리스 결정을 안내하기 위해 오류 예산 정책을 사용하세요.

가용성, 지연 시간 및 오류율을 포함한 가장 중요한 사용자 측면을Cover하는 2-4개의 SLO로 시작하세요. 필요에 따라 더 추가하되 경고를 피하세요.

개발자 세부 정보

작성자

sickn33

라이선스

MIT

리포지토리

https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/observability-monitoring-slo-implement

참조

main

파일 구조

📁 resources/

📄 implementation-playbook.md

📄 SKILL.md