CVPR Workshop on Causal and Object-Centric Representations for Robotics

Current approaches in computer vision and machine learning primarily rely on identifying statistical correlations within massive datasets. This reliance limits their efficacy in areas that necessitate generalization through higher-order cognition, such as domain generalization and planning. A foundational approach to overcome these limitations involves incorporating principles of causality into the processing of large datasets. Similar to classic AI methodologies, causal inference usually assumes that the causal variables of interest are provided externally. However, real-world data, often encapsulated in high-dimensional, low-level observations (e.g., RGB pixels in a video), generally lacks organization into meaningful causal units.

Causal Representation Learning proposes a promising approach by integrating principles of causality, enabling models to discern cause-and-effect relationships and thereby generate controllable representations.
From another perspective, Object-centric Representation Learning focuses on decomposing sensory inputs, such as images and videos, into set-based representations where distinct vectors represent different objects.
Thus, Robotics and Embodied AI that require compositional and controllable scene representations can highly benefit from object-centric and causal representations.

This workshop aims to bring together researchers from structured (object-centric and causal) representation learning and robotics-oriented computer vision. To help integrate ideas from these areas, we invite researchers from Embodied AI, Causality and Representation Learning. We hope that this creates opportunities for discussion, presenting cutting-edge research, establishing new collaborations and identifying future research directions.

Topics of Interest

Object-centric and causal representation learning methods aim to overcome the challenges posed by conventional models that rely solely on correlations, offering a new pathway for advancing computer vision, particularly in dynamic and complex environments like robotics. Importantly, with the current vision and embodied AI systems, it is not obvious how to model interventions, counterfactuals, and hypotheticals without resorting to severe manual hand-engineering. While much can be done with significant supervision, ideally, robots and embodied agents should learn autonomously from the simulated environment. With those concepts in mind, we welcome contributions in the direction of:

Causal representation learning: how to learn representations with deep networks that conform to cause-and-effect transformations in the pixel space.
Object-centric learning: how to learn representations that are object-specific without requiring closed-world manual annotations.
Scaling structured representations: how to learn object-centric learning and causal representations on real-world image and video data such as MS COCO images or videos from YouTube.
Downstream applications of structured representations: how to use causal and object-centric representations for tasks such as reinforcement learning, planning, and decision-making.
Learning of interventions: how can a robot algorithm transform and control different components of the environment to achieve certain goals.
Causal Reinforcement Learning for Embodied AI: how to best learn RL policies to achieve goals if cause-and-effect relations are known.
Benchmarks that quantify the benefits of causal and object-centric representations (e.g. systematic generalization, OOD performance, robustness wrt. interventions, etc.).
Relations and possible synergies to foundation models.

Important Dates and Links

Submission site opens	April 01 '24 12:00 AM UTC
Submission deadline (4-page submissions)	April 28 '24 12:00 PM UTC
Submission deadline (1-page abstracts)	May 10 '24 12:00 PM UTC
Decisions announced	~~April 30th~~ 3rd of May
Camera-ready due	~~April 30th~~ 3rd of May