Skills pufferlib

🎮

pufferlib

Name: pufferlib
Author: K-Dense-AI

Safe ⚙️ External commands⚡ Contains scripts

Train reinforcement learning agents fast

Also available from: davila7

Training RL agents requires high-performance parallel environments and efficient algorithms. PufferLib provides optimized PPO+LSTM training with 2-10x speedups through vectorization, shared memory buffers, and multi-agent support.

Supports: Claude Codex Code(CC)

🥈 78 Silver

Download the skill ZIP

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

Toggle on and start using

Test it

Using "pufferlib". Train PPO on CartPole with pufferlib

Expected outcome:

Environment: gym-CartPole-v1 with 256 parallel envs
Policy: 2-layer MLP (256 hidden units) with layer_init
Training: 10,000 iterations, batch size 32768
Checkpoint: Saved to checkpoints/checkpoint_1000.pt
Final throughput: 1.2M steps/second on GPU

Using "pufferlib". Create multi-agent environment

Expected outcome:

Multi-agent setup: 4 agents in cooperative navigation task
Observation space: Dict with position, goal, and other agent positions
Action space: 5 discrete actions (4 directions + stay)
Shared policy backbone for efficient learning
Training with PuffeRL at 800K steps/second

Security Audit

Safe

v4 • 1/17/2026

All 331 static findings are FALSE POSITIVES. This is a legitimate open-source reinforcement learning library. The static analyzer incorrectly flagged bash command examples in markdown documentation (SKILL.md, references/*.md) as dangerous backtick execution. No actual command injection, credential exfiltration, or malicious patterns exist in the codebase. Verified via grep - no hashlib, subprocess, or actual dangerous execution patterns found.

Files scanned

5,444

Lines analyzed

findings

Total audits

Risk Factors

⚙️ External commands (6)

SKILL.md:33-269 references/integration.md:1-50 references/environments.md:1-30 references/training.md:1-50 references/policies.md:1-50 references/vectorization.md:1-50

⚡ Contains scripts (2)

scripts/env_template.py:1-341 scripts/train_template.py:1-240

Audited by: claude View Audit History →

Quality Score

Architecture

100

Maintainability

Content

Community

100

Security

Spec Compliance

What You Can Build

Fast benchmarking

Quickly benchmark new algorithms on Ocean environments with millions of steps per second throughput

Game environment training

Train agents on Atari, Procgen, or NetHack with optimized vectorization and efficient PPO

Cooperative agent teams

Build and train multi-agent systems with PettingZoo integration and shared policy options

Try These Prompts

Basic environment training

Use pufferlib to train a PPO agent on the procgen-coinrun environment with 256 parallel envs. Show the training loop and how to save checkpoints.

Custom environment

Help me create a custom PufferEnv for a grid world task with 4 discrete actions. Show the reset, step, and observation space definitions.

Multi-agent training

Use pufferlib to train multiple agents on a PettingZoo environment. Show how to handle dict observations and shared policies.

Performance tuning

Optimize my pufferlib training setup for maximum throughput. What vectorization settings and hyperparameters should I use for 4 GPUs?

Best Practices

Start with Ocean environments or Gymnasium integration before building custom environments
Profile steps per second early to identify bottlenecks before scaling
Use torch.compile and CUDA for maximum training throughput

Avoid

Avoid using CPU for large-scale training - use GPU with sufficient VRAM
Do not skip environment validation before scaling to many parallel envs
Avoid hardcoding hyperparameters - use CLI arguments for reproducibility

Frequently Asked Questions

What environments does pufferlib support?

Gymnasium, PettingZoo, Atari, Procgen, NetHack, Minigrid, Neural MMO, Crafter, and 20+ Ocean suite environments.

How fast is pufferlib compared to standard implementations?

Achieves 2-10x speedups through optimized vectorization, shared memory, and efficient PPO+LSTM implementation.

Can I use pufferlib with custom environments?

Yes, implement the PufferEnv API with reset, step methods, and observation/action spaces for best performance.

Does pufferlib support multi-GPU training?

Yes, use torchrun with --nproc_per_node for multi-GPU and NCCL for multi-node distributed training.

What logging frameworks integrate with pufferlib?

Weights & Biases (wandb) and Neptune loggers are built-in with simple configuration.

How do I save and resume training?

Use trainer.save_checkpoint() and trainer.load_checkpoint() with periodic save frequency for resume capability.

Developer Details

Author

K-Dense-AI

License

MIT license

Repository

https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/pufferlib

Ref

main

File structure

📁 references/

📄 environments.md

📄 integration.md

📄 policies.md

📄 training.md

📄 vectorization.md

📁 scripts/

📄 env_template.py

📄 train_template.py

📄 evaluation_result.json

📄 SKILL.md