Skills context-degradation
📦

context-degradation

Safe

Detect Context Degradation in LLMs

Also available from: muratcankoylan,ChakshuGautam,Asmayaseen

Language models exhibit predictable performance degradation as context length increases. This skill helps diagnose lost-in-middle, poisoning, distraction, and clash patterns to build more reliable AI systems.

Supports: Claude Codex Code(CC)
⚠️ 65 Poor
1

Download the skill ZIP

2

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

3

Toggle on and start using

Test it

Using "context-degradation". Conversation has 60000 tokens. Agent started producing incorrect summaries after turn 20.

Expected outcome:

Analysis: Context degradation detected. The lost-in-middle phenomenon is likely causing the agent to miss key information from the middle of context. Recommendation: Apply compaction to summarize earlier context, or restructure to place critical info at edges.

Using "context-degradation". User asks about code from turn 1, but agent refers to wrong implementation from turn 15.

Expected outcome:

Analysis: Context clash detected. Multiple implementations exist in context with conflicting details. Recommendation: Use explicit versioning and mark conflicts for clarification before proceeding.

Security Audit

Safe
v1 • 2/24/2026

Static analysis flagged 20 potential issues including external_commands, network, and weak cryptographic algorithms. All findings are FALSE POSITIVES: the 'external_commands' detections are YAML token count examples with backtick formatting; 'network' is a legitimate GitHub URL in metadata; 'weak cryptographic algorithm' is a pattern matching error triggered by the word 'degradation'; 'system reconnaissance' falsely triggers on 'multi-source retrieval'. This skill is pure educational documentation about LLM context degradation with no executable code.

1
Files scanned
239
Lines analyzed
4
findings
1
Total audits

High Risk Issues (4)

False Positive: External Commands Detection
Static scanner detected 'Ruby/shell backtick execution' at lines 169, 176, 179. These are YAML token count examples (turn_20: 60000 tokens) used as documentation, not actual shell commands.
False Positive: Network Security Detection
Static scanner detected 'Hardcoded URL' at line 4. This is a legitimate GitHub source URL in the skill metadata, not a security vulnerability.
False Positive: Weak Cryptographic Algorithm
Static scanner incorrectly flagged 'weak cryptographic algorithm' at 16 locations. Pattern matcher triggers on the word 'degradation' (appears as 'deg' in scanning patterns). No cryptographic code exists in this skill.
False Positive: System Reconnaissance
Static scanner flagged 'System reconnaissance' at line 92. Content discusses 'multi-source retrieval' in the context of information retrieval research, not system reconnaissance.
Audited by: claude

Quality Score

38
Architecture
100
Maintainability
87
Content
31
Community
65
Security
91
Spec Compliance

What You Can Build

Debug Agent Failures

When an AI agent produces incorrect or irrelevant outputs during long conversations, use this skill to identify whether context degradation is the root cause

Design Resilient Systems

Architect systems that handle large contexts reliably by applying the Four-Bucket Approach and architectural patterns described in this skill

Evaluate Context Choices

Make informed decisions about context engineering for production systems by understanding degradation thresholds and mitigation strategies

Try These Prompts

Basic Degradation Check
Analyze this conversation for context degradation patterns. The conversation has grown to over 50000 tokens. Look for signs of lost-in-middle, poisoning, distraction, or clash.
Lost-in-Middle Diagnosis
Review the attached context and identify if critical information is buried in the middle. The task requires information from the middle section but outputs are incorrect.
Context Poisoning Recovery
Analyze this context for signs of poisoning. Symptoms include degraded output quality, tool misalignment, and persistent hallucinations despite corrections. What steps can recover?
Architectural Pattern Selection
Given a system that processes 200K+ token contexts with multiple independent tasks, recommend which Four-Bucket strategies (Write, Select, Compress, Isolate) to apply and why.

Best Practices

  • Place critical information at the beginning or end of context where attention is highest
  • Monitor context length and performance correlation during development
  • Implement compaction triggers before degradation becomes severe

Avoid

  • Assuming longer context always improves performance
  • Loading all retrieved documents without relevance filtering
  • Allowing context to grow indefinitely without segmentation

Frequently Asked Questions

What is the lost-in-middle phenomenon?
The lost-in-middle phenomenon is when models demonstrate U-shaped attention curves. Information at the beginning and end of context receives reliable attention, while information in the middle suffers from dramatically reduced recall accuracy.
How does context poisoning occur?
Context poisoning occurs when hallucinations, errors, or incorrect information enters context and compounds through repeated reference. Once poisoned, context creates feedback loops that reinforce incorrect beliefs.
What is the Four-Bucket Approach?
The Four-Bucket Approach includes: Write (save context outside the window), Select (pull relevant context through retrieval), Compress (reduce tokens through summarization), and Isolate (split context across sub-agents).
Do larger context windows always help?
No. Larger contexts can create new problems including performance degradation curves, disproportionate cost increases, and cognitive bottleneck issues where models struggle to maintain quality across many tasks.
How do I know if my context is poisoned?
Watch for symptoms including degraded output quality on tasks that previously succeeded, tool misalignment where agents call wrong tools, and hallucinations that persist despite correction attempts.
Which models handle long context best?
According to benchmarks, Claude Opus 4.5 shows degradation around 100K tokens, GPT-5.2 (thinking mode) around 64K, and Gemini 3 Pro around 500K. However, benchmarks vary by task type.

Developer Details

File structure

📄 SKILL.md