nowait-reasoning-optimizer
Reduce LLM reasoning tokens by 50%
Chain-of-thought reasoning models generate verbose self-reflection tokens that increase costs and latency. This skill implements the NOWAIT technique to suppress unnecessary reflection tokens during inference, reducing token usage by 27-51% while maintaining accuracy on RL-based reasoning models.
์คํฌ ZIP ๋ค์ด๋ก๋
Claude์์ ์ ๋ก๋
์ค์ โ ๊ธฐ๋ฅ โ ์คํฌ โ ์คํฌ ์ ๋ก๋๋ก ์ด๋
ํ ๊ธ์ ์ผ๊ณ ์ฌ์ฉ ์์
ํ ์คํธํด ๋ณด๊ธฐ
"nowait-reasoning-optimizer" ์ฌ์ฉ ์ค์ ๋๋ค. Optimize inference for QwQ-32B to reduce thinking tokens
์์ ๊ฒฐ๊ณผ:
- Initialized NOWAIT with 17 reflection keywords
- Suppressing: wait, hmm, but, however, alternatively, check, verify...
- Excluded false positives: ohio, button, checkout, checksum...
- Token set built: approximately N tokens identified for suppression
- Ready for logits_processor integration with 27-51% expected token reduction
"nowait-reasoning-optimizer" ์ฌ์ฉ ์ค์ ๋๋ค. Apply NOWAIT to Kimi-VL-A3B for visual QA task
์์ ๊ฒฐ๊ณผ:
- Configured NOWAIT for multimodal model
- Expected token reduction: 40-60% on visual QA tasks
- Applied all default reflection keywords
- Model will skip unnecessary self-reflection while preserving visual reasoning
"nowait-reasoning-optimizer" ์ฌ์ฉ ์ค์ ๋๋ค. Benchmark Qwen3-32B with and without NOWAIT
์์ ๊ฒฐ๊ณผ:
- Baseline: 15000 tokens on AIME math problem
- NOWAIT: 10500 tokens with 30% reduction
- Accuracy maintained at approximately 66-68%
- Significant cost savings for large-scale evaluation
๋ณด์ ๊ฐ์ฌ
์์ Legitimate ML optimization utility implementing a published research paper technique. Pure Python inference-time token manipulation with no network access, no file I/O beyond tokenizer loading, and no external command execution. All static findings are false positives from markdown code examples and benign ML patterns.
์ํ ์์ธ
โ๏ธ ์ธ๋ถ ๋ช ๋ น์ด (37)
๐ ๋คํธ์ํฌ ์ ๊ทผ (1)
ํ์ง ์ ์
๋ง๋ค ์ ์๋ ๊ฒ
Optimize production inference
Deploy efficient reasoning models with reduced compute costs and latency for production systems
Reduce benchmarking costs
Run large-scale reasoning benchmarks with 30-50% fewer tokens while preserving accuracy
Cut token usage fees
Lower API costs when using reasoning models by suppressing verbose reflection patterns
์ด ํ๋กฌํํธ๋ฅผ ์ฌ์ฉํด ๋ณด์ธ์
Use the NOWAIT Reasoning Optimizer to suppress self-reflection tokens during generation. Initialize NOWAITLogitProcessor with the model's tokenizer and apply it during model.generate() with max_new_tokens=32768.
Configure vLLM to use NOWAIT by calling get_nowait_bad_words_ids() with the tokenizer and pass the result to SamplingParams for efficient batch inference.
Create a custom NOWAITConfig with domain-specific keywords to suppress, excluding false positives like butterfly or checkout that should not be filtered.
Use NOWAITStoppingCriteria instead of full suppression to allow some reflection tokens but stop generation if reflection count exceeds a configurable threshold.
๋ชจ๋ฒ ์ฌ๋ก
- Test token reduction on your specific model before production deployment
- Monitor accuracy on hard tasks when using NOWAIT on distilled models
- Use the exclusion patterns to prevent false positives on legitimate words
ํผํ๊ธฐ
- Applying NOWAIT to distilled small models without accuracy validation
- Using NOWAIT on non-reasoning models that do not generate reflection tokens
- Suppressing keywords without checking excluded patterns first
์์ฃผ ๋ฌป๋ ์ง๋ฌธ
Which models work best with NOWAIT?
What token reduction can I expect?
Does NOWAIT affect answer accuracy?
Can I customize which tokens are suppressed?
Is my data safe when using this skill?
How does NOWAIT compare to other optimization techniques?
๊ฐ๋ฐ์ ์ธ๋ถ ์ ๋ณด
์์ฑ์
davila7๋ผ์ด์ ์ค
MIT
๋ฆฌํฌ์งํ ๋ฆฌ
https://github.com/davila7/claude-code-templates/tree/main/cli-tool/components/skills/productivity/nowait์ฐธ์กฐ
main
ํ์ผ ๊ตฌ์กฐ