DQN의 경우 function apprroximator로 CNN을 사용하는데 그러한 CNN을 sample efficiency를 위해 equivariant 하게 만든다는듯
These approaches implicitly assume that the transition and reward dynamics of the environment are invariant to affine transformations of the visual state. In fact, some approaches explicitly use a contrastive loss term to induce the agent to learn translation-invariant feature representations.
Since with data augmentation, the model must learn equivariance in addition to the task itself, more training time and greater model capacity are often required. Even then, data augmentation results only in approximate equivariance whereas equivariant neural networks guarantee it and often have stronger generalization as well
→ data augmentation 보다 equivariance neural network 를 사용하는 것이 model 이 더 가볍고, 더 일반화가 잘 된다.
Contribution
First, we define and analyze an important class of MDPs that we call group-invariant MDPs.
Second, we introduce a new variation of the Equivariant DQN (Mondal et al., 2020), and we further introduce equivariant variations of SAC (Haarnoja et al., 2018), and learning from demonstration
Finally, we show that our methods convincingly outperform recent competitive data augmentation approaches