observability-monitoring-monitor-setup
Set up comprehensive monitoring and observability
Implementing monitoring from scratch is complex and error-prone. This skill provides proven patterns for metrics, tracing, and logging that reduce MTTR and give full system visibility.
Télécharger le ZIP du skill
Importer dans Claude
Allez dans ParamĂštres â CapacitĂ©s â Skills â Importer un skill
Activez et commencez Ă utiliser
Tester
Utilisation de "observability-monitoring-monitor-setup". Set up Prometheus scraping for a Kubernetes cluster with automatic pod discovery
Résultat attendu:
- Prometheus configuration with kubernetes_sd_configs for auto-discovery
- Pod annotations required for scrape targeting
- Relabel rules to filter and tag discovered targets
- Verification steps to confirm scraping is working
Utilisation de "observability-monitoring-monitor-setup". Create an alert for memory usage exceeding 90%
Résultat attendu:
- PromQL expression using container_memory_working_set_bytes
- Alert rule with appropriate thresholds and duration
- Runbook steps for investigating memory pressure
- Grafana panel query to visualize memory trends
Audit de sécurité
SûrThis skill contains documentation and code samples for monitoring setup. All static analysis findings are false positives - backticks are markdown code block delimiters, not shell execution. URLs are internal service endpoints. Environment variable usage follows standard configuration patterns. No malicious patterns detected.
Score de qualité
Ce que vous pouvez construire
Greenfield Service Monitoring
Set up complete observability stack for a new microservice from day one with metrics, tracing, and logging.
Production Incident Response
Create actionable dashboards and alerts to reduce MTTR and enable proactive issue detection.
SLO Definition and Tracking
Define service level objectives with error budgets and implement burn rate monitoring for reliability engineering.
Essayez ces prompts
Help me add Prometheus metrics to my Node.js API. I need request count, error rate, and latency tracking. Show me the prom-client setup and how to expose a /metrics endpoint.
Create a Grafana dashboard JSON for my payment service showing the four golden signals. Include panels for request rate, error rate, p95/p99 latency, and saturation metrics.
I need alerting rules for high error rate (>5% for 5 minutes) and slow response time (p95 >1s for 10 minutes). Configure Alertmanager to route critical alerts to PagerDuty and warnings to Slack.
Define SLOs for my API with 99.9% availability target over 30 days. Show me how to calculate error budget, set up multi-window burn rate alerts, and create Grafana panels for SLO tracking.
Bonnes pratiques
- Use histogram buckets aligned with your SLO targets for accurate percentile calculation
- Add consistent labels (service, environment, version) to all metrics for effective filtering
- Test alerts against historical data to minimize false positives before enabling notifications
Ăviter
- Monitoring everything without clear ownership leads to alert fatigue and ignored pages
- Using average latency instead of percentiles hides tail latency problems affecting users
- Setting up dashboards before defining what questions they need to answer wastes effort