\[ \def\N{\mathbb{N}} \def\Z{\mathbb{Z}} \def\I{\mathbb{I}} \def\Q{\mathbb{Q}} \def\R{\mathbb{R}} \def\V{\mathbb{V}} %\def\C{\mathbb{C}} \def\A{\mathcal{A}} \def\D{\mathcal{D}} \def\Cset{\mathcal{C}} \def\E{\mathcal{E}} \def\F{\mathcal{F}} \def\cI{\mathcal{I}} \def\L{\mathcal{L}} \def\M{\mathcal{M}} \def\N{\mathcal{N}} \def\O{\mathcal{O}} \def\P{\mathcal{P}} \def\S{\mathcal{S}} \def\T{\mathcal{T}} \def\p{\mathcal{p}} \def\P{\mathcal{P}} \def\rneg{\tilde{\neg}} \def\rle{\prec} \def\rge{\succ} \def\rand{\curlywedge} \def\ror{\curlyvee} \newcommand{\var}[1]{V\left({#1}\right)} \newcommand{\gvar}[1]{K\left({#1}\right)} \newcommand{\app}[2]{{#1}(#2)} \newcommand{\gnd}[1]{\underline{#1}} \newcommand{\appf}[2]{\gnd{#1(#2)}} \renewcommand{\vec}[1]{#1} \newcommand{\mat}[1]{\mathbf{#1}} \newcommand{\norm}[1]{\left\lVert#1\right\rVert} \newcommand{\fd}[1]{\dot{#1}} \newcommand{\sd}[1]{\ddot{#1}} \newcommand{\td}[1]{\dddot{#1}} \newcommand{\fourthd}[1]{\ddddot{#1}} \newcommand{\diff}[2]{\frac{\partial#1}{\partial#2}} \newcommand{\tdiff}[2]{d {#1}_{#2}} \newcommand{\prob}[1]{p\left(#1\right)} \newcommand{\probc}[2]{\prob{#1 \;\middle\vert\; #2}} \newcommand{\probdist}[2]{p_{#1}\left(#2\right)} \newcommand{\probcdist}[3]{\probdist{#1}{#2 \;\middle\vert\; #3}} \newcommand{\KL}[2]{D_{KL}\left(#1 \;\delimsize\|\; #2 \right)} \newcommand{\vecM}[1]{\begin{bmatrix} #1 \end{bmatrix}} \newcommand{\set}[1]{\left\{#1\right\}} \newcommand{\fset}[2]{\set{#1 \;\middle\vert\; #2}} \newcommand{\noindex}{\hspace*{-0.8em}} \newcommand{\xmark}{\ding{55}} \newcommand{\ce}[2]{{#1}_{#2}} \newcommand{\lb}[1]{\ce{\iota}{#1}} \newcommand{\ub}[1]{\ce{\upsilon}{#1}} \newcommand{\rot}[1]{\rlap{\rotatebox{45}{#1}~}} \newcommand{\tf}[3]{^{#1}\mat{#2}_{#3}} \newcommand{\Csset}[1]{\Cset \left(#1 \right)} \DeclareMathOperator{\acos}{\text{acos}} \DeclareMathOperator{\asin}{\text{asin}} \DeclareMathOperator{\sgn}{\text{sgn}} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \newcommand{\code}[1]{\texttt{#1}} \]

Abstract

Teaser image

Learning long-horizon manipulation tasks efficiently is a central challenge in robot learning from demonstration. Unlike recent endeavors that focus on directly learning the task in the action domain, we focus on inferring what the robot should achieve in the task, rather than how to do so. To this end, we represent evolving scene states using a series of graphical object relationships. We propose a demonstration segmentation and pooling approach that extracts a series of manipulation graphs and estimates distributions over object states across task phases. In contrast to prior graph-based methods that capture only partial interactions or short temporal windows, our approach captures complete object interactions spanning from the onset of control to the end of the manipulation. To improve robustness when learning from multiple demonstrations, we additionally perform object matching using pre-trained visual features. In extensive experiments, we evaluate our method's demonstration segmentation accuracy and the utility of learning from multiple demonstrations for finding a desired minimal task model. Finally, we deploy the fitted models both in simulation and on a real robot, demonstrating that the resulting task representations support reliable execution across environments.

Technical Approach

Our approach, presented in the figure above, consists of five steps:

Demonstration Segmentation: Demonstrations of object trajectories are processed using a probabilistic model to identify temporary connectedness of object pairs. This model first measures the mutual information of the trajectories in small time windows and measures a distribution of pairwise object distances while the objects are moving together. Using this distribution, the approach assesses the probability of the connection persisting before and past the observed motion. This is akin to extracting kinematic graphs which persist during a motion. As relationships form and break over time, a sparse series of graphs forms over time.

Event Generation: Instead of working with the full graphs, we generate events from the sequence of graphs. An event is emitted, whenever the reachability between two graphs changes. In particular, we emit an event when objects are connected or disconnected from the subgraph of a manipulator. Each event captures the objects which changed subgraph membership and the objects which are contained in the objects' new subgraph, as well as their world-space poses at the moment the event is emitted.

Object Matching: To associate events across demonstrations, we use pre-trained VLM features associated with the objects observed in the demonstrations to associate matching objects across demonstrations. We encode this problem as a \(k\)-association problem by scoring the pairwise associtions according to the log-similarity of object features. This matching reduces the number of objects considered for the task to the smallest number observed among the demonstrations.

Event Matching: Given the object match, we extract the variance of the task by matching equivalent events across demonstrations. Each event can have multiple structurally compatible matches. For each possible set of associations, we measure the entropy of relative poses of the objects switching subgraphs and the objects already in that subgraph. We use the match with the lowest entropy.

Task Skeleton Extraction: We extract a simple, linear task skeleton from the matched events. Every transition of an object into the subgraph of a manipulator is interpreted as an activation and removal from all manipulator graphs as a deactivation. For both types of steps, we capture distributions of relative objects poses as per our prior work OPLICT.

Evaluation

We evaluate our approach's performance in demonstration segmentation and task skeleton extraction on two datasets. We deploy the extracted models using magic actions in Robocasa and on a real robot using a simple QP-based control scheme.


Move two cups to a tray.
Place the pot on the cooker and place the lid on the pot.
Weigh a profile and place it with the others

Video - Teaser & Real-World Experiments

Code

The code to this work will be released upon acceptance.

Publications

Adrian Röfer, Nick Heppert, and Abhinav Valada
SparTa: Sparse Graphical Task Models from a Handful of Demonstrations
Under review, 2026.

(PDF) (BibTeX)

Authors

Adrian Röfer

Adrian Röfer

University of Freiburg

Nick Heppert

Nick Heppert

University of Freiburg

Abhinav Valada

Abhinav Valada

University of Freiburg

Acknowledgment

This work was funded by the Carl Zeiss Foundation with the ReScaLe project and the BrainLinks-BrainTools center of the University of Freiburg. Nick Heppert is supported by the Konrad Zuse School of Excellence in Learning and Intelligent Systems (ELIZA) through the DAAD programme Konrad Zuse Schools of Excellence in Artificial Intelligence, sponsored by the Federal Ministry of Education and Research.


The authors would like to thank Karla Štěpánová for taking the time to discuss the method and giving feedback on the manuscript. We would also like to thank Maurice Funk for his consultation on \(k\)-AP matching.