SO(2) Equivariant RL

Contribution 1. First, we define and analyze an important class of MDPs that we call group-invariant MDPs.

Contribution 2. Second, we introduce a new variation of the Equivariant DQN (Mondal et al., 2020), and we further introduce equivariant variations of SAC (Haarnoja et al., 2018), and learning from demonstration (LfD).

Contribution 3. Finally, we show that our methods convincingly outperform recent competitive data augmentation approaches

observation : end-effector의 dx,dy,dz와 쿼터니안

Group Invariant MDPs

Definition 1. A G-invariant MDP MG = (S, A, T, R, G) is an MDP M =(S, A, T, R) that satisfies the following conditions :

Reward Invariance: R(s, a) = R(gs, ga)
Transition Invariance: T (s, a, s′) = T (gs, ga, gs′).

G-invariant MDP : MDP homomorphism’s special case

'MDP homomorphism'는 MDP의 추상화 방법 중 하나로, MDP의 구조적 특성과 동작을 유지하면서 더 간단한 형태로 변환하는 것

장점 1. 복잡한 MDP의 상태와 행동 공간을 축소하여 계산 효율성을 높임

장점 2. 추상화된 구조를 학습함으로써 원래 문제에서 일반화 능력을 향상

장점 3. 추상화된 MDP에서 학습된 정책을 원래 MDP로 쉽게 변환하여 사용 가능

Proposition 1. Let MG be a group-invariant MDP. Then its optimal Q-function is group invari- ant, Q∗(s, a) = Q∗(gs, ga), and its optimal policy is group-equivariant, π∗(gs) = gπ∗(s),