Blog posts

2025

Optimizer Architecture in Stable-Baselines3 for Safe Reinforcement Learning

5 minute read

Published:

You’re building a safe reinforcement learning (RL) algorithm involving rewards and costs. You now have a question: to train the policy, reward critic, and cost critic, should you use one optimizer like Stable Baselines3 (SB3) [1]? Or should you use separate optimizers as seen in many safe RL libraries like Omnisafe or SafePO [2, 3]? Or are there any other options?