Skills context-degradation

📦

context-degradation

Name: context-degradation
Author: sickn33

Safe

Detect Context Degradation in LLMs

Also available from: Asmayaseen,muratcankoylan,ChakshuGautam

Language models exhibit predictable performance degradation as context length increases. This skill helps diagnose lost-in-middle, poisoning, distraction, and clash patterns to build more reliable AI systems.

Supports: Claude Codex Code(CC)

⚠️ 63 Poor

Download the skill ZIP

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

Toggle on and start using

Test it

Using "context-degradation". Conversation has 60000 tokens. Agent started producing incorrect summaries after turn 20.

Expected outcome:

Analysis: Context degradation detected. The lost-in-middle phenomenon is likely causing the agent to miss key information from the middle of context. Recommendation: Apply compaction to summarize earlier context, or restructure to place critical info at edges.

Using "context-degradation". User asks about code from turn 1, but agent refers to wrong implementation from turn 15.

Expected outcome:

Analysis: Context clash detected. Multiple implementations exist in context with conflicting details. Recommendation: Use explicit versioning and mark conflicts for clarification before proceeding.

Security Audit

Safe

v1 • 2/24/2026

Static analysis flagged 20 potential issues including external_commands, network, and weak cryptographic algorithms. All findings are FALSE POSITIVES: the 'external_commands' detections are YAML token count examples with backtick formatting; 'network' is a legitimate GitHub URL in metadata; 'weak cryptographic algorithm' is a pattern matching error triggered by the word 'degradation'; 'system reconnaissance' falsely triggers on 'multi-source retrieval'. This skill is pure educational documentation about LLM context degradation with no executable code.

Files scanned

239

Lines analyzed

findings

Total audits

High Risk Issues (4)

SKILL.md:169-176 SKILL.md:176-179 SKILL.md:179-195

False Positive: External Commands Detection

Static scanner detected 'Ruby/shell backtick execution' at lines 169, 176, 179. These are YAML token count examples (turn_20: 60000 tokens) used as documentation, not actual shell commands.

SKILL.md:4

False Positive: Network Security Detection

Static scanner detected 'Hardcoded URL' at line 4. This is a legitimate GitHub source URL in the skill metadata, not a security vulnerability.

SKILL.md:3 SKILL.md:15 SKILL.md:20 SKILL.md:22 SKILL.md:45 SKILL.md:59 SKILL.md:95 SKILL.md:99 SKILL.md:114 SKILL.md:117 SKILL.md:120 SKILL.md:140 SKILL.md:205 SKILL.md:229

False Positive: Weak Cryptographic Algorithm

Static scanner incorrectly flagged 'weak cryptographic algorithm' at 16 locations. Pattern matcher triggers on the word 'degradation' (appears as 'deg' in scanning patterns). No cryptographic code exists in this skill.

SKILL.md:92

False Positive: System Reconnaissance

Static scanner flagged 'System reconnaissance' at line 92. Content discusses 'multi-source retrieval' in the context of information retrieval research, not system reconnaissance.

Audited by: claude

Quality Score

Architecture

100

Maintainability

Content

Community

Security

Spec Compliance

What You Can Build

Debug Agent Failures

When an AI agent produces incorrect or irrelevant outputs during long conversations, use this skill to identify whether context degradation is the root cause

Design Resilient Systems

Architect systems that handle large contexts reliably by applying the Four-Bucket Approach and architectural patterns described in this skill

Evaluate Context Choices

Make informed decisions about context engineering for production systems by understanding degradation thresholds and mitigation strategies

Try These Prompts

Basic Degradation Check

Analyze this conversation for context degradation patterns. The conversation has grown to over 50000 tokens. Look for signs of lost-in-middle, poisoning, distraction, or clash.

Lost-in-Middle Diagnosis

Review the attached context and identify if critical information is buried in the middle. The task requires information from the middle section but outputs are incorrect.

Context Poisoning Recovery

Analyze this context for signs of poisoning. Symptoms include degraded output quality, tool misalignment, and persistent hallucinations despite corrections. What steps can recover?

Architectural Pattern Selection

Given a system that processes 200K+ token contexts with multiple independent tasks, recommend which Four-Bucket strategies (Write, Select, Compress, Isolate) to apply and why.

Best Practices

Place critical information at the beginning or end of context where attention is highest
Monitor context length and performance correlation during development
Implement compaction triggers before degradation becomes severe

Avoid

Assuming longer context always improves performance
Loading all retrieved documents without relevance filtering
Allowing context to grow indefinitely without segmentation

Frequently Asked Questions

What is the lost-in-middle phenomenon?

The lost-in-middle phenomenon is when models demonstrate U-shaped attention curves. Information at the beginning and end of context receives reliable attention, while information in the middle suffers from dramatically reduced recall accuracy.

How does context poisoning occur?

Context poisoning occurs when hallucinations, errors, or incorrect information enters context and compounds through repeated reference. Once poisoned, context creates feedback loops that reinforce incorrect beliefs.

What is the Four-Bucket Approach?

The Four-Bucket Approach includes: Write (save context outside the window), Select (pull relevant context through retrieval), Compress (reduce tokens through summarization), and Isolate (split context across sub-agents).

Do larger context windows always help?

No. Larger contexts can create new problems including performance degradation curves, disproportionate cost increases, and cognitive bottleneck issues where models struggle to maintain quality across many tasks.

How do I know if my context is poisoned?

Watch for symptoms including degraded output quality on tasks that previously succeeded, tool misalignment where agents call wrong tools, and hallucinations that persist despite correction attempts.

Which models handle long context best?

According to benchmarks, Claude Opus 4.5 shows degradation around 100K tokens, GPT-5.2 (thinking mode) around 64K, and Gemini 3 Pro around 500K. However, benchmarks vary by task type.

Developer Details

Author

sickn33

License

MIT

Repository

https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/context-degradation

Ref

main

File structure

📄 SKILL.md