trlX v0.3.0 is here! We've been working hard on this for the last month and we're super excited :) Highlights: Support for hyper parameter sweeps, refactored losses and models allows for better extendability, optimizations to PPO and ILQL.
github.com/CarperAI/trlx 1/5
Conversation
Replying to
1) We've added support for Ray Tune, and worked with to create programmatic reports from your hparam sweeps 📊
2) We've added PPO normalization among other regularization methods, improves convergence and stability. ⏳ 2/5
1
4
9
1
3
10

