Skills nowait-reasoning-optimizer

⚡

nowait-reasoning-optimizer

Name: nowait-reasoning-optimizer
Author: davila7

Safe ⚙️ External commands🌐 Network access

Reduce LLM reasoning tokens by 50%

Chain-of-thought reasoning models generate verbose self-reflection tokens that increase costs and latency. This skill implements the NOWAIT technique to suppress unnecessary reflection tokens during inference, reducing token usage by 27-51% while maintaining accuracy on RL-based reasoning models.

Supports: Claude Codex Code(CC)

🥉 72 Bronze

Download the skill ZIP

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

Toggle on and start using

Test it

Using "nowait-reasoning-optimizer". Optimize inference for QwQ-32B to reduce thinking tokens

Expected outcome:

Initialized NOWAIT with 17 reflection keywords
Suppressing: wait, hmm, but, however, alternatively, check, verify...
Excluded false positives: ohio, button, checkout, checksum...
Token set built: approximately N tokens identified for suppression
Ready for logits_processor integration with 27-51% expected token reduction

Using "nowait-reasoning-optimizer". Apply NOWAIT to Kimi-VL-A3B for visual QA task

Expected outcome:

Configured NOWAIT for multimodal model
Expected token reduction: 40-60% on visual QA tasks
Applied all default reflection keywords
Model will skip unnecessary self-reflection while preserving visual reasoning

Using "nowait-reasoning-optimizer". Benchmark Qwen3-32B with and without NOWAIT

Expected outcome:

Baseline: 15000 tokens on AIME math problem
NOWAIT: 10500 tokens with 30% reduction
Accuracy maintained at approximately 66-68%
Significant cost savings for large-scale evaluation

Security Audit

Safe

v5 • 1/17/2026

Legitimate ML optimization utility implementing a published research paper technique. Pure Python inference-time token manipulation with no network access, no file I/O beyond tokenizer loading, and no external command execution. All static findings are false positives from markdown code examples and benign ML patterns.

Files scanned

785

Lines analyzed

findings

Total audits

Risk Factors

⚙️ External commands (37)

refrences/keywords.md:7 refrences/keywords.md:11-31 refrences/keywords.md:31-37 refrences/keywords.md:37-47 refrences/keywords.md:47-83 refrences/keywords.md:83 refrences/keywords.md:83 refrences/keywords.md:83 refrences/keywords.md:83-87 refrences/keywords.md:87 refrences/keywords.md:87 refrences/keywords.md:87-91 refrences/keywords.md:91 refrences/keywords.md:91 refrences/keywords.md:91 refrences/keywords.md:91-95 refrences/keywords.md:95 refrences/keywords.md:95 refrences/keywords.md:95-99 refrences/keywords.md:99 refrences/keywords.md:99 refrences/keywords.md:99-114 refrences/keywords.md:114-115 refrences/keywords.md:115-118 refrences/keywords.md:118-130 SKILL.md:37-49 SKILL.md:49-53 SKILL.md:53-55 SKILL.md:55-58 SKILL.md:58-66 SKILL.md:66-72 SKILL.md:72-95 SKILL.md:95-111 SKILL.md:111-115 SKILL.md:115-126 SKILL.md:126-145 SKILL.md:145-146

🌐 Network access (1)

skill-report.json:6

Audited by: claude View Audit History →

Quality Score

Architecture

100

Maintainability

Content

Community

100

Security

Spec Compliance

What You Can Build

Optimize production inference

Deploy efficient reasoning models with reduced compute costs and latency for production systems

Reduce benchmarking costs

Run large-scale reasoning benchmarks with 30-50% fewer tokens while preserving accuracy

Cut token usage fees

Lower API costs when using reasoning models by suppressing verbose reflection patterns

Try These Prompts

Basic NOWAIT setup

Use the NOWAIT Reasoning Optimizer to suppress self-reflection tokens during generation. Initialize NOWAITLogitProcessor with the model's tokenizer and apply it during model.generate() with max_new_tokens=32768.

vLLM integration

Configure vLLM to use NOWAIT by calling get_nowait_bad_words_ids() with the tokenizer and pass the result to SamplingParams for efficient batch inference.

Custom keywords

Create a custom NOWAITConfig with domain-specific keywords to suppress, excluding false positives like butterfly or checkout that should not be filtered.

Hybrid approach

Use NOWAITStoppingCriteria instead of full suppression to allow some reflection tokens but stop generation if reflection count exceeds a configurable threshold.

Best Practices

Test token reduction on your specific model before production deployment
Monitor accuracy on hard tasks when using NOWAIT on distilled models
Use the exclusion patterns to prevent false positives on legitimate words

Avoid

Applying NOWAIT to distilled small models without accuracy validation
Using NOWAIT on non-reasoning models that do not generate reflection tokens
Suppressing keywords without checking excluded patterns first

Frequently Asked Questions

Which models work best with NOWAIT?

RL-based models like QwQ-32B, Phi4-Reasoning-Plus, and Qwen3-32B show 27-51% token reduction. Distilled models may lose accuracy on hard tasks.

What token reduction can I expect?

Math tasks see 30% reduction, visual QA up to 50%, and video QA around 27%. Results vary by model and task complexity.

Does NOWAIT affect answer accuracy?

RL-based models maintain stable accuracy. Distilled models may show degradation on challenging tasks. Always validate for your use case.

Can I customize which tokens are suppressed?

Yes, provide custom keywords and excluded patterns via NOWAITConfig for domain-specific tuning.

Is my data safe when using this skill?

Yes, NOWAIT runs locally during inference and does not transmit data externally. It only manipulates model logits.

How does NOWAIT compare to other optimization techniques?

NOWAIT is training-free and works at inference time. It complements other techniques like quantization and KV cache optimization.

Developer Details

Author

davila7

License

MIT

Repository

https://github.com/davila7/claude-code-templates/tree/main/cli-tool/components/skills/productivity/nowait

Ref

main

File structure

📁 refrences/

📄 keywords.md

📁 scripts/

📄 nowait_processor.py

📄 SKILL.md