keywork : equivariance, diffusion policy(closed-loop, have a receding horizon)

Related Work’s limitation

most of them either focus on only simple pick-and-place like tasks, do not support closed-loop policies, or do not support scale equivariance

Contribution : Our method combines SIM(3)-equivariant neural network architectures with diffusion policy, thus can be robustly trained and generalizes to unseen object appearance, initial states, scales, and poses.


Preliminaries

SIM(3) = SE(3) + scale을 의미

transformation T := (R, t, s) ∈ SIM(3), where R, t, and s denote rotation, translation, and scale respectively

Model input : demonstration trajectory dataset τ^n ( n : data 개수)

τ 는 sequences of observation-action pairs (Ot, At) 로 구성되어 있다.

Ot = (Xt, St) 는 scene point cloud Xt 와 robot proprioception state St로 구성되어 있다.

robot proprioception state St 에 대한 설명

We represent robot proprioception information with St = (S^(x)t, S^(d)t, S^(s)t), with 3D positions in S^(x)t, normalized directions in S^(d)t, and scalars in S^(s)t.

Robot proprioception can be converted into such a format that uses positions, velocities, offsets, and scalars in most cases, e.g. end-effector positions go to S^(x)t; end-effector velocities can be converted to position targets and go to S^(x)t; end-effector orientations can be converted into rotation matrices and placed in S^(v)t; gripper open-close states go to S^(s)t.

Similarly, our representation for actions At = (A(v)t, A(d)t, A(s)t) consists of 3D offsets or velocities A(v)t, normalized directions A(d)t, and scalars S(s)t.