Organizing samples in a trajective manner can improve the learning efficiency for offline RL algorithms.
May 1, 2024
An offline-to-online reinforcement learning method that improves transition efficiency with Q-ensembles.
Jan 1, 2024