observability-monitoring-monitor-setup
Set up comprehensive monitoring and observability
Implementing monitoring from scratch is complex and error-prone. This skill provides proven patterns for metrics, tracing, and logging that reduce MTTR and give full system visibility.
์คํฌ ZIP ๋ค์ด๋ก๋
Claude์์ ์ ๋ก๋
์ค์ โ ๊ธฐ๋ฅ โ ์คํฌ โ ์คํฌ ์ ๋ก๋๋ก ์ด๋
ํ ๊ธ์ ์ผ๊ณ ์ฌ์ฉ ์์
ํ ์คํธํด ๋ณด๊ธฐ
"observability-monitoring-monitor-setup" ์ฌ์ฉ ์ค์ ๋๋ค. Set up Prometheus scraping for a Kubernetes cluster with automatic pod discovery
์์ ๊ฒฐ๊ณผ:
- Prometheus configuration with kubernetes_sd_configs for auto-discovery
- Pod annotations required for scrape targeting
- Relabel rules to filter and tag discovered targets
- Verification steps to confirm scraping is working
"observability-monitoring-monitor-setup" ์ฌ์ฉ ์ค์ ๋๋ค. Create an alert for memory usage exceeding 90%
์์ ๊ฒฐ๊ณผ:
- PromQL expression using container_memory_working_set_bytes
- Alert rule with appropriate thresholds and duration
- Runbook steps for investigating memory pressure
- Grafana panel query to visualize memory trends
๋ณด์ ๊ฐ์ฌ
์์ This skill contains documentation and code samples for monitoring setup. All static analysis findings are false positives - backticks are markdown code block delimiters, not shell execution. URLs are internal service endpoints. Environment variable usage follows standard configuration patterns. No malicious patterns detected.
ํ์ง ์ ์
๋ง๋ค ์ ์๋ ๊ฒ
Greenfield Service Monitoring
Set up complete observability stack for a new microservice from day one with metrics, tracing, and logging.
Production Incident Response
Create actionable dashboards and alerts to reduce MTTR and enable proactive issue detection.
SLO Definition and Tracking
Define service level objectives with error budgets and implement burn rate monitoring for reliability engineering.
์ด ํ๋กฌํํธ๋ฅผ ์ฌ์ฉํด ๋ณด์ธ์
Help me add Prometheus metrics to my Node.js API. I need request count, error rate, and latency tracking. Show me the prom-client setup and how to expose a /metrics endpoint.
Create a Grafana dashboard JSON for my payment service showing the four golden signals. Include panels for request rate, error rate, p95/p99 latency, and saturation metrics.
I need alerting rules for high error rate (>5% for 5 minutes) and slow response time (p95 >1s for 10 minutes). Configure Alertmanager to route critical alerts to PagerDuty and warnings to Slack.
Define SLOs for my API with 99.9% availability target over 30 days. Show me how to calculate error budget, set up multi-window burn rate alerts, and create Grafana panels for SLO tracking.
๋ชจ๋ฒ ์ฌ๋ก
- Use histogram buckets aligned with your SLO targets for accurate percentile calculation
- Add consistent labels (service, environment, version) to all metrics for effective filtering
- Test alerts against historical data to minimize false positives before enabling notifications
ํผํ๊ธฐ
- Monitoring everything without clear ownership leads to alert fatigue and ignored pages
- Using average latency instead of percentiles hides tail latency problems affecting users
- Setting up dashboards before defining what questions they need to answer wastes effort
์์ฃผ ๋ฌป๋ ์ง๋ฌธ
How do I choose the right scrape interval for my metrics?
Should I trace every request or sample?
What is the difference between RED and USE monitoring?
How do I set meaningful SLO targets?
Do I need all three pillars (metrics, logs, traces) from day one?
How long should I retain monitoring data?
๊ฐ๋ฐ์ ์ธ๋ถ ์ ๋ณด
์์ฑ์
sickn33๋ผ์ด์ ์ค
MIT
๋ฆฌํฌ์งํ ๋ฆฌ
https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/observability-monitoring-monitor-setup์ฐธ์กฐ
main
ํ์ผ ๊ตฌ์กฐ