JointDiff: Bridging Continuous and Discrete in Multi-Agent Trajectory Generation

1 Kognia Sports Intelligence
2Visual Intelligence for Transportation (EPFL)
3Institut de Robòtica i Informàtica Industrial (CSIC-UPC)

ICLR 2026

JointDiff simultaneously generates multi-agent trajectories and synchronous discrete events, enabling high-level semantic control through natural language and player possession sequences.

Abstract

Generative models often treat continuous data and discrete events as separate processes, creating a gap in modeling complex systems where they interact synchronously. To bridge this gap, we introduce JointDiff, a novel diffusion framework designed to unify these two processes by simultaneously generating continuous spatio-temporal data and synchronous discrete events. We demonstrate its efficacy in the sports domain by simultaneously modeling multi-agent trajectories and key possession events. This joint modeling is validated with non-controllable generation and two novel controllable generation scenarios: weak-possessor-guidance, which offers flexible semantic control over game dynamics through a simple list of intended ball possessors, and text-guidance, which enables fine-grained, language-driven generation. To enable the conditioning with these guidance signals, we introduce CrossGuid, an effective conditioning operation for multi-agent domains.

BibTeX

@inproceedings{capellera2026jointdiff,
  title={JointDiff: Bridging Continuous and Discrete in Multi-Agent Trajectory Generation},
  author={Capellera, Guillem and Ferraz, Luis and Rubio, Antonio and Alahi, Alexandre and Agudo, Antonio},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2026}
}