Blog posts

Optimizer Architecture in Stable-Baselines3 for Safe Reinforcement Learning

5 minute read

Published: July 26, 2025

You’re building a safe reinforcement learning (RL) algorithm involving rewards and costs. You now have a question: to train the policy, reward critic, and cost critic, should you use one optimizer like Stable Baselines3 (SB3) [1]? Or should you use separate optimizers as seen in many safe RL libraries like Omnisafe or SafePO [2, 3]? Or are there any other options?

Minjae Kwon

Blog posts

2025

Optimizer Architecture in Stable-Baselines3 for Safe Reinforcement Learning