pufferlib
Train reinforcement learning agents fast
Also available from: davila7
Training RL agents requires high-performance parallel environments and efficient algorithms. PufferLib provides optimized PPO+LSTM training with 2-10x speedups through vectorization, shared memory buffers, and multi-agent support.
Download the skill ZIP
Upload in Claude
Go to Settings → Capabilities → Skills → Upload skill
Toggle on and start using
Test it
Using "pufferlib". Train PPO on CartPole with pufferlib
Expected outcome:
- Environment: gym-CartPole-v1 with 256 parallel envs
- Policy: 2-layer MLP (256 hidden units) with layer_init
- Training: 10,000 iterations, batch size 32768
- Checkpoint: Saved to checkpoints/checkpoint_1000.pt
- Final throughput: 1.2M steps/second on GPU
Using "pufferlib". Create multi-agent environment
Expected outcome:
- Multi-agent setup: 4 agents in cooperative navigation task
- Observation space: Dict with position, goal, and other agent positions
- Action space: 5 discrete actions (4 directions + stay)
- Shared policy backbone for efficient learning
- Training with PuffeRL at 800K steps/second
Security Audit
SafeAll 331 static findings are FALSE POSITIVES. This is a legitimate open-source reinforcement learning library. The static analyzer incorrectly flagged bash command examples in markdown documentation (SKILL.md, references/*.md) as dangerous backtick execution. No actual command injection, credential exfiltration, or malicious patterns exist in the codebase. Verified via grep - no hashlib, subprocess, or actual dangerous execution patterns found.
Risk Factors
Quality Score
What You Can Build
Fast benchmarking
Quickly benchmark new algorithms on Ocean environments with millions of steps per second throughput
Game environment training
Train agents on Atari, Procgen, or NetHack with optimized vectorization and efficient PPO
Cooperative agent teams
Build and train multi-agent systems with PettingZoo integration and shared policy options
Try These Prompts
Use pufferlib to train a PPO agent on the procgen-coinrun environment with 256 parallel envs. Show the training loop and how to save checkpoints.
Help me create a custom PufferEnv for a grid world task with 4 discrete actions. Show the reset, step, and observation space definitions.
Use pufferlib to train multiple agents on a PettingZoo environment. Show how to handle dict observations and shared policies.
Optimize my pufferlib training setup for maximum throughput. What vectorization settings and hyperparameters should I use for 4 GPUs?
Best Practices
- Start with Ocean environments or Gymnasium integration before building custom environments
- Profile steps per second early to identify bottlenecks before scaling
- Use torch.compile and CUDA for maximum training throughput
Avoid
- Avoid using CPU for large-scale training - use GPU with sufficient VRAM
- Do not skip environment validation before scaling to many parallel envs
- Avoid hardcoding hyperparameters - use CLI arguments for reproducibility
Frequently Asked Questions
What environments does pufferlib support?
How fast is pufferlib compared to standard implementations?
Can I use pufferlib with custom environments?
Does pufferlib support multi-GPU training?
What logging frameworks integrate with pufferlib?
How do I save and resume training?
Developer Details
Author
K-Dense-AILicense
MIT license
Repository
https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/pufferlibRef
main
File structure