service-mesh-observability
Set up service mesh observability fast
Service mesh telemetry is hard to wire across traces, metrics, and dashboards. This skill provides ready templates and queries for Istio and Linkerd observability.
Download the skill ZIP
Upload in Claude
Go to Settings → Capabilities → Skills → Upload skill
Toggle on and start using
Test it
Using "service-mesh-observability". Give me a concise checklist to enable Istio observability with Prometheus and Jaeger.
Expected outcome:
- Deploy Prometheus with the Istio scrape config and ServiceMonitor.
- Enable tracing in Istio and point Zipkin to Jaeger collector.
- Install Jaeger all-in-one and expose the UI port.
- Add PromQL panels for request rate, errors, and P99 latency.
Using "service-mesh-observability". How do I check which services are making the most requests?
Expected outcome:
- Use Linkerd viz top command: linkerd viz top deploy/my-app
- Or query Prometheus: sum(rate(istio_requests_total[5m])) by (destination_service_name)
- Check the Grafana Istio dashboards for visualized request rates by service.
Using "service-mesh-observability". Set up alerting for high error rates.
Expected outcome:
- Create a PrometheusRule with expression: sum(rate(istio_requests_total{response_code=~"5.."}[5m])) by (destination_service_name) / sum(rate(istio_requests_total[5m])) by (destination_service_name) > 0.05
- Set for: 5m threshold to avoid alert flapping.
- Label with severity: critical and include service name in summary.
Security Audit
SafePure documentation skill containing YAML templates, PromQL queries, and CLI examples for service mesh observability. All static findings are false positives: the scanner misinterpreted PromQL metric names (containing 'md5', 'sha' substrings) as weak crypto, flagged documentation links as network IOCs, and misidentified YAML field names as path traversal. The content is static documentation that matches its stated purpose exactly.
Risk Factors
🌐 Network access (12)
⚙️ External commands (17)
📁 Filesystem access (1)
Quality Score
What You Can Build
Stand up mesh monitoring
Use templates to wire Prometheus, Grafana, and tracing for a new service mesh.
Investigate latency spikes
Apply PromQL queries and tracing setup to locate high latency services.
Define mesh SLOs
Use golden signal guidance to frame SLOs and alert rules for services.
Try These Prompts
List the minimal steps and templates to enable Istio metrics and tracing in a new cluster.
Provide the key PromQL queries for request rate, error rate, and P99 latency by service.
Give an IstioOperator and Jaeger deployment example for distributed tracing.
Combine Prometheus, Grafana, Jaeger, Kiali, and OTel templates into a staged rollout plan.
Best Practices
- Sample tracing at high rates in dev and lower in production to control costs.
- Use consistent trace context propagation across all services.
- Alert on golden signals with clear thresholds defined in PrometheusRule.
Avoid
- Collecting high cardinality labels without limits on Prometheus.
- Running 100 percent tracing in production by default.
- Operating without dashboards for service dependencies and topology.