技能 prompt-caching

📦

prompt-caching

Name: prompt-caching
Author: sickn33

安全

通过提示缓存优化LLM成本

通过战略性提示缓存实施（包括Anthropic原生缓存、响应缓存和CAG模式）降低LLM API成本高达90%。

支持: Claude Codex Code(CC)

🥉 72 青铜

下载技能 ZIP

在 Claude 中上传

前往设置 → 功能 → 技能 → 上传技能

开启并开始使用

测试它

正在使用“prompt-caching”。 How do I reduce my LLM API costs?

预期结果:

You can reduce LLM API costs by implementing prompt caching. The main strategies are: 1) Use Anthropic's native prompt caching for repeated system prompts - this caches the prompt prefix so you only pay for new tokens. 2) Implement response caching for identical queries - store hash of prompt and return cached response. 3) Use Cache Augmented Generation (CAG) to pre-cache documents instead of doing RAG retrieval each time.

正在使用“prompt-caching”。 Why is my cache not working?

预期结果:

Common reasons prompt caching fails: 1) Using non-zero temperature - cached responses vary. 2) Changing prompt prefixes frequently - cache depends on exact prefix match. 3) Not using the cache-control header properly. 4) Exceeding the cache TTL (typically 5 minutes for Anthropic).

安全审计

安全

v1 • 2/24/2026

All static findings are false positives. The skill is a markdown documentation file with no executable code. The external_commands detections are markdown backticks used for code formatting. The weak cryptographic algorithm detections are misinterpretations of YAML frontmatter content. No actual security risks present.

已扫描文件

分析行数

发现项

审计总数

未发现安全问题

审计者: claude

质量评分

架构

100

可维护性

内容

社区

100

安全

100

规范符合性

你能构建什么

降低生产应用的API成本

实施提示缓存以显著降低具有重复上下文的LLM API成本

优化长时间运行的对话

使用缓存维护对话上下文，而无需在每条消息上支付完整上下文成本

改善响应延迟

利用缓存响应为重复查询实现更快的响应时间

试试这些提示

基础提示缓存设置

How do I set up prompt caching with Claude API? Show me the basic implementation steps.

响应缓存策略

Design a response caching strategy for a Q&A system that handles similar user queries. Include cache key design and invalidation logic.

CAG实现指南

Explain Cache Augmented Generation (CAG) and provide a Python implementation pattern for pre-caching documents.

缓存失效最佳实践

What are the best practices for cache invalidation in LLM applications? Include time-based and event-based strategies.

最佳实践

构建具有跨请求保持一致的静态前缀的提示
缓存精确匹配的响应时使用零温度
使用基于时间或基于事件的触发器实施适当的缓存失效
监控缓存命中率并相应优化前缀结构

避免

缓存具有高温设置的响应 - 输出会发生变化，缓存数据变得无用
没有任何失效策略的缓存 - 陈旧数据导致错误响应
无差别地缓存所有内容 - 在缓存未命中时增加延迟而没有相应收益

常见问题

什么是提示缓存？

提示缓存是一种存储提示前缀的计算状态的技术，以便在多个请求中重复使用，减少处理的令牌数量并降低成本。

通过提示缓存能节省多少成本？

用户报告的成本降低幅度为50-90%，具体取决于有多少提示可以作为稳定前缀被缓存。

提示缓存是否适用于所有Claude模型？

提示缓存由支持cache_control参数的Claude模型支持。请查看Anthropic API文档了解模型兼容性。

提示缓存和响应缓存有什么区别？

提示缓存使用模型缓存计算前缀的原生能力。响应缓存由您自己实施 - 在您自己的存储中存储相同查询的完整响应。

缓存持续多长时间？

Anthropic的提示缓存通常持续5分钟，但这因API版本而异。响应缓存TTL由您的实现决定。

可以缓存温度大于0的响应吗？

您不应该缓存非零温度的响应，因为输出会发生变化，使缓存数据不可靠。

开发者详情

作者

sickn33

许可证

MIT

仓库

https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/prompt-caching

引用

main

文件结构

📄 SKILL.md