Habilidades prompt-caching
📦

prompt-caching

Seguro

Optimize LLM Costs with Prompt Caching

Reduce LLM API costs by up to 90% through strategic prompt caching implementation including Anthropic native caching, response caching, and CAG patterns.

Suporta: Claude Codex Code(CC)
🥉 75 Bronze
1

Baixar o ZIP da skill

2

Upload no Claude

Vá em Configurações → Capacidades → Skills → Upload skill

3

Ative e comece a usar

Testar

A utilizar "prompt-caching". How do I reduce my LLM API costs?

Resultado esperado:

You can reduce LLM API costs by implementing prompt caching. The main strategies are: 1) Use Anthropic's native prompt caching for repeated system prompts - this caches the prompt prefix so you only pay for new tokens. 2) Implement response caching for identical queries - store hash of prompt and return cached response. 3) Use Cache Augmented Generation (CAG) to pre-cache documents instead of doing RAG retrieval each time.

A utilizar "prompt-caching". Why is my cache not working?

Resultado esperado:

Common reasons prompt caching fails: 1) Using non-zero temperature - cached responses vary. 2) Changing prompt prefixes frequently - cache depends on exact prefix match. 3) Not using the cache-control header properly. 4) Exceeding the cache TTL (typically 5 minutes for Anthropic).

Auditoria de Segurança

Seguro
v1 • 2/24/2026

All static findings are false positives. The skill is a markdown documentation file with no executable code. The external_commands detections are markdown backticks used for code formatting. The weak cryptographic algorithm detections are misinterpretations of YAML frontmatter content. No actual security risks present.

1
Arquivos analisados
66
Linhas analisadas
0
achados
1
Total de auditorias
Nenhum problema de segurança encontrado
Auditado por: claude

Pontuação de qualidade

38
Arquitetura
100
Manutenibilidade
87
Conteúdo
50
Comunidade
100
Segurança
100
Conformidade com especificações

O Que Você Pode Construir

Reduce API Costs for Production Applications

Implement prompt caching to dramatically reduce LLM API costs in production systems with repeated context

Optimize Long-Running Conversations

Use caching to maintain conversation context without incurring full context costs on each message

Improve Response Latency

Leverage cached responses to achieve faster response times for repeated queries

Tente Estes Prompts

Basic Prompt Caching Setup
How do I set up prompt caching with Claude API? Show me the basic implementation steps.
Response Caching Strategy
Design a response caching strategy for a Q&A system that handles similar user queries. Include cache key design and invalidation logic.
CAG Implementation Guide
Explain Cache Augmented Generation (CAG) and provide a Python implementation pattern for pre-caching documents.
Cache Invalidation Best Practices
What are the best practices for cache invalidation in LLM applications? Include time-based and event-based strategies.

Melhores Práticas

  • Structure prompts with static prefixes that remain consistent across requests
  • Use zero temperature when caching responses for exact matches
  • Implement proper cache invalidation with time-based or event-based triggers
  • Monitor cache hit rates and optimize prefix structure accordingly

Evitar

  • Caching responses with high temperature settings - outputs will vary and cached data becomes useless
  • Caching without any invalidation strategy - stale data leads to incorrect responses
  • Caching everything indiscriminately - increases latency on cache misses without proportional benefit

Perguntas Frequentes

What is prompt caching?
Prompt caching is a technique that stores the computed state of a prompt prefix so it can be reused across multiple requests, reducing the number of tokens processed and lowering costs.
How much can I save with prompt caching?
Users report cost reductions of 50-90% depending on how much of your prompts can be cached as stable prefixes.
Does prompt caching work with all Claude models?
Prompt caching is supported by Claude models that support the cache_control parameter. Check the Anthropic API documentation for model compatibility.
What is the difference between prompt caching and response caching?
Prompt caching uses the model's native ability to cache computed prefixes. Response caching is implemented by you - storing full responses for identical queries in your own storage.
How long does the cache last?
Anthropic's prompt cache typically lasts 5 minutes, but this varies by API version. Response caching TTL is determined by your implementation.
Can I cache responses with temperature greater than 0?
You should not cache responses with non-zero temperature because the outputs will vary, making cached data unreliable.

Detalhes do Desenvolvedor

Estrutura de arquivos

📄 SKILL.md