Organizing samples in a trajective manner can improve the learning efficiency for offline RL algorithms.
May 1, 2024
Jan 1, 2024