prompt-caching
通过提示缓存优化LLM成本
通过战略性提示缓存实施(包括Anthropic原生缓存、响应缓存和CAG模式)降低LLM API成本高达90%。
下载技能 ZIP
在 Claude 中上传
前往 设置 → 功能 → 技能 → 上传技能
开启并开始使用
测试它
正在使用“prompt-caching”。 How do I reduce my LLM API costs?
预期结果:
You can reduce LLM API costs by implementing prompt caching. The main strategies are: 1) Use Anthropic's native prompt caching for repeated system prompts - this caches the prompt prefix so you only pay for new tokens. 2) Implement response caching for identical queries - store hash of prompt and return cached response. 3) Use Cache Augmented Generation (CAG) to pre-cache documents instead of doing RAG retrieval each time.
正在使用“prompt-caching”。 Why is my cache not working?
预期结果:
Common reasons prompt caching fails: 1) Using non-zero temperature - cached responses vary. 2) Changing prompt prefixes frequently - cache depends on exact prefix match. 3) Not using the cache-control header properly. 4) Exceeding the cache TTL (typically 5 minutes for Anthropic).
安全审计
安全All static findings are false positives. The skill is a markdown documentation file with no executable code. The external_commands detections are markdown backticks used for code formatting. The weak cryptographic algorithm detections are misinterpretations of YAML frontmatter content. No actual security risks present.
质量评分
你能构建什么
降低生产应用的API成本
实施提示缓存以显著降低具有重复上下文的LLM API成本
优化长时间运行的对话
使用缓存维护对话上下文,而无需在每条消息上支付完整上下文成本
改善响应延迟
利用缓存响应为重复查询实现更快的响应时间
试试这些提示
How do I set up prompt caching with Claude API? Show me the basic implementation steps.
Design a response caching strategy for a Q&A system that handles similar user queries. Include cache key design and invalidation logic.
Explain Cache Augmented Generation (CAG) and provide a Python implementation pattern for pre-caching documents.
What are the best practices for cache invalidation in LLM applications? Include time-based and event-based strategies.
最佳实践
- 构建具有跨请求保持一致的静态前缀的提示
- 缓存精确匹配的响应时使用零温度
- 使用基于时间或基于事件的触发器实施适当的缓存失效
- 监控缓存命中率并相应优化前缀结构
避免
- 缓存具有高温设置的响应 - 输出会发生变化,缓存数据变得无用
- 没有任何失效策略的缓存 - 陈旧数据导致错误响应
- 无差别地缓存所有内容 - 在缓存未命中时增加延迟而没有相应收益