keywork : equivariance, diffusion policy(closed-loop, have a receding horizon)
Related Work’s limitation
most of them either focus on only simple pick-and-place like tasks, do not support closed-loop policies, or do not support scale equivariance
Contribution : Our method combines SIM(3)-equivariant neural network architectures with diffusion policy, thus can be robustly trained and generalizes to unseen object appearance, initial states, scales, and poses.
SIM(3) = SE(3) + scale을 의미
transformation T := (R, t, s) ∈ SIM(3), where R, t, and s denote rotation, translation, and scale respectively
Model input : demonstration trajectory dataset τ^n ( n : data 개수)
τ 는 sequences of observation-action pairs (Ot, At) 로 구성되어 있다.
Ot = (Xt, St) 는 scene point cloud Xt 와 robot proprioception state St로 구성되어 있다.
robot proprioception state St 에 대한 설명
We represent robot proprioception information with St = (S^(x)t, S^(d)t, S^(s)t), with 3D positions in S^(x)t, normalized directions in S^(d)t, and scalars in S^(s)t.
Robot proprioception can be converted into such a format that uses positions, velocities, offsets, and scalars in most cases, e.g. end-effector positions go to S^(x)t; end-effector velocities can be converted to position targets and go to S^(x)t; end-effector orientations can be converted into rotation matrices and placed in S^(v)t; gripper open-close states go to S^(s)t.
Similarly, our representation for actions At = (A(v)t, A(d)t, A(s)t) consists of 3D offsets or velocities A(v)t, normalized directions A(d)t, and scalars S(s)t.