R+X: Retrieval and Execution from Everyday Human Videos

Georgios Papagiannis, Norman Di Palo, Pietro Vitiello and Edward Johns

Published at ICRA 2025

Mini abstract. R+X enables robots to learn skills from long, unlabelled first-person videos of humans performing everyday tasks. Given a language command from a human, R+X first retrieves short video clips containing relevant behaviour, and then conditions an in-context imitation learning technique (KAT) on this behaviour to execute the skill.