技能 prompt-caching
📦

prompt-caching

安全

Optimize LLM Costs with Prompt Caching

Reduce LLM API costs by up to 90% through strategic prompt caching implementation including Anthropic native caching, response caching, and CAG patterns.

支援: Claude Codex Code(CC)
🥉 75 青銅
1

下載技能 ZIP

2

在 Claude 中上傳

前往 設定 → 功能 → 技能 → 上傳技能

3

開啟並開始使用

測試它

正在使用「prompt-caching」。 How do I reduce my LLM API costs?

預期結果:

You can reduce LLM API costs by implementing prompt caching. The main strategies are: 1) Use Anthropic's native prompt caching for repeated system prompts - this caches the prompt prefix so you only pay for new tokens. 2) Implement response caching for identical queries - store hash of prompt and return cached response. 3) Use Cache Augmented Generation (CAG) to pre-cache documents instead of doing RAG retrieval each time.

正在使用「prompt-caching」。 Why is my cache not working?

預期結果:

Common reasons prompt caching fails: 1) Using non-zero temperature - cached responses vary. 2) Changing prompt prefixes frequently - cache depends on exact prefix match. 3) Not using the cache-control header properly. 4) Exceeding the cache TTL (typically 5 minutes for Anthropic).

安全審計

安全
v1 • 2/24/2026

All static findings are false positives. The skill is a markdown documentation file with no executable code. The external_commands detections are markdown backticks used for code formatting. The weak cryptographic algorithm detections are misinterpretations of YAML frontmatter content. No actual security risks present.

1
已掃描檔案
66
分析行數
0
發現項
1
審計總數
未發現安全問題
審計者: claude

品質評分

38
架構
100
可維護性
87
內容
50
社群
100
安全
100
規範符合性

你能建構什麼

Reduce API Costs for Production Applications

Implement prompt caching to dramatically reduce LLM API costs in production systems with repeated context

Optimize Long-Running Conversations

Use caching to maintain conversation context without incurring full context costs on each message

Improve Response Latency

Leverage cached responses to achieve faster response times for repeated queries

試試這些提示

Basic Prompt Caching Setup
How do I set up prompt caching with Claude API? Show me the basic implementation steps.
Response Caching Strategy
Design a response caching strategy for a Q&A system that handles similar user queries. Include cache key design and invalidation logic.
CAG Implementation Guide
Explain Cache Augmented Generation (CAG) and provide a Python implementation pattern for pre-caching documents.
Cache Invalidation Best Practices
What are the best practices for cache invalidation in LLM applications? Include time-based and event-based strategies.

最佳實務

  • Structure prompts with static prefixes that remain consistent across requests
  • Use zero temperature when caching responses for exact matches
  • Implement proper cache invalidation with time-based or event-based triggers
  • Monitor cache hit rates and optimize prefix structure accordingly

避免

  • Caching responses with high temperature settings - outputs will vary and cached data becomes useless
  • Caching without any invalidation strategy - stale data leads to incorrect responses
  • Caching everything indiscriminately - increases latency on cache misses without proportional benefit

常見問題

What is prompt caching?
Prompt caching is a technique that stores the computed state of a prompt prefix so it can be reused across multiple requests, reducing the number of tokens processed and lowering costs.
How much can I save with prompt caching?
Users report cost reductions of 50-90% depending on how much of your prompts can be cached as stable prefixes.
Does prompt caching work with all Claude models?
Prompt caching is supported by Claude models that support the cache_control parameter. Check the Anthropic API documentation for model compatibility.
What is the difference between prompt caching and response caching?
Prompt caching uses the model's native ability to cache computed prefixes. Response caching is implemented by you - storing full responses for identical queries in your own storage.
How long does the cache last?
Anthropic's prompt cache typically lasts 5 minutes, but this varies by API version. Response caching TTL is determined by your implementation.
Can I cache responses with temperature greater than 0?
You should not cache responses with non-zero temperature because the outputs will vary, making cached data unreliable.

開發者詳情

檔案結構

📄 SKILL.md