技能 error-diagnostics-error-analysis
📦

error-diagnostics-error-analysis

安全

Diagnose Production Errors

This skill helps developers quickly diagnose and resolve production errors using systematic debugging techniques, distributed tracing analysis, and comprehensive observability patterns.

支援: Claude Codex Code(CC)
🥉 74 青銅
1

下載技能 ZIP

2

在 Claude 中上傳

前往 設定 → 功能 → 技能 → 上傳技能

3

開啟並開始使用

測試它

正在使用「error-diagnostics-error-analysis」。 Error: Database connection timeout after 30s in order-service

預期結果:

  • Analysis: Connection pool exhaustion detected
  • Root cause: Long-running queries holding all connections
  • Recommended fix: Implement query timeout and optimize N+1 patterns
  • Prevention: Add circuit breaker and connection pool monitoring

正在使用「error-diagnostics-error-analysis」。 Intermittent 502 errors in API gateway

預期結果:

  • Pattern: Errors occur during peak traffic windows
  • Correlation: New autoscaling policy deployed yesterday
  • Root cause: Backend services scaling slower than load balancer
  • Recommendation: Adjust scaling parameters and add health check validation

安全審計

安全
v1 • 2/24/2026

After evaluating 108 static findings, all detections are false positives. The scanner misinterpreted markdown documentation patterns as security issues: backticks in code blocks were flagged as shell execution, example URLs were flagged as network exfiltration, and environment variable reads in example error-tracking code were flagged as credential access. The sensitive data deletion code (lines 751-752) is a security best practice that removes cookies/authorization headers before sending error reports. This is a legitimate error diagnostics skill providing observability documentation.

2
已掃描檔案
1,194
分析行數
0
發現項
1
審計總數
未發現安全問題
審計者: claude

品質評分

38
架構
100
可維護性
87
內容
50
社群
100
安全
91
規範符合性

你能建構什麼

Investigate Production Incidents

Analyze production errors, correlate with deployments, and identify root cause using distributed tracing and log analysis.

Debug Application Errors

Examine stack traces, identify error patterns, and implement fixes for application-level errors.

Improve System Observability

Design and implement comprehensive error tracking, monitoring, and alerting solutions for better incident detection.

試試這些提示

Basic Error Analysis
Analyze this error message and stack trace. Identify the likely cause and suggest a fix: $ERROR_MESSAGE
Distributed System Debugging
Debug this distributed system error. The error occurred in service $SERVICE_NAME with trace ID $TRACE_ID. Examine the distributed trace and identify which upstream service caused the failure.
Post-Incident Review
Conduct a post-incident review for this outage. Error pattern: $ERROR_PATTERN. Timeline: $TIMELINE. What were the contributing factors and what preventive measures would you recommend?
Observability Implementation
Design an observability implementation for a Node.js/Express application. Include error tracking setup with Sentry, distributed tracing with OpenTelemetry, and alerting rules for critical errors.

最佳實務

  • Always correlate errors with deployments, configuration changes, and external events
  • Implement structured logging with correlation IDs for distributed tracing
  • Create retry logic with exponential backoff for transient failures
  • Establish error budgets and alerting thresholds based on user impact

避免

  • Ignoring intermittent errors - they often indicate systemic issues
  • Implementing generic error handling without context-specific recovery
  • Sending raw error data to external systems without scrubbing sensitive information
  • Setting alerting thresholds too low, causing alert fatigue

常見問題

What information should I provide for effective error analysis?
Provide the complete error message, full stack trace, timestamps, affected service names, recent deployment history, and any relevant log excerpts.
How do I debug errors in a distributed system?
Use correlation IDs to trace requests across services, examine distributed tracing data, identify the failure point, and trace backward to find the root cause.
What observability tools does this skill support?
The skill covers Sentry, DataDog, OpenTelemetry, Prometheus, Grafana, Jaeger, and other major observability platforms.
Can this skill help implement error tracking?
Yes, the implementation playbook includes code samples for Sentry, DataDog, and other error tracking SDK integration.
How do I prioritize errors for fixing?
Prioritize by user impact, error frequency, severity, and whether the error indicates a systemic issue or single-point failure.
What is the recommended error handling strategy?
Implement defense in depth: retry logic with backoff, circuit breakers, proper error propagation, comprehensive logging, and automated alerting.