Research Project

GREAT: Beyond Graph Neural Networks: Joint graph topology learning and graph-based inference for computer vision


National Project

Start Date


End Date


Project Code


Project illustration


Project Description

Project PID2019-110977GA-I00 funded by MCIN/ AEI /10.13039/501100011033

Graphs are an ubiquitous data structure, employed extensively within computer science and related fields, including pattern recognition, social networks, transportation networks, and biological systems, to name but a few. Recently, Graph Neural Networks (GNN) have emerged as a deep learning framework able to operate on graph domains to perform inference tasks in an end-to-end fashion. When the graph is explicit, GNN haven proven to be an extremely powerful modeling paradigm. However, in a variety of data domains, including point clouds, text corpora and untrimmed video, the graph structure underlying the data is unknown, and must be assumed or inferred.

Generally speaking, inferring graph topologies from observations is an ill-posed problem, and there are many ways of associating a topology with the observed data samples. Modern approaches for graph topology inference adopt a Graph Signal Processing (GSP) perspective, which explicitly models certain properties of the graph signals (e.g., smoothness, sparsity). While this emphasizes the relation between graph topology and the associated graph signals, the approach is tied to strong a priori assumptions on the signals.

The goal of the current project is the theoretical and computational investigation of models, methods and algorithms for defining a novel framework that enables to learn graph topology jointly with an inference task on a graph in a data-driven, end-to-end formulation, hence combining the strengths of GNN and GSP.

The proposed methodological developments will be put to test on real-world data in the challenging computer vision tasks of temporal event/action segmentation and event/action localization. This is motivated by recent neuroscientific findings showing that neural event representations in humans arise form temporal community structures akin to graphs. The proposed solutions are however not limited to this context and may contribute to many other areas of application that share similar challenges, e.g., anomaly detection, change detection, or image and motion segmentation.
The work plan of the project is structured around the following specific objectives:
1. Exploring the use of different techniques for graph learning regularization, with special emphasis on nonlocal methods to reveal complex
long-range data inter-dependencies.
2. Modeling the joint learning of graph topology and graph embedding in an unsupervised fashion.
3. Modeling end-to-end graph topology learning and clustering by absorbing application-driven priors in the learning problem.
4. Modeling end-to-end graph topology learning and weakly supervised node classification, leveraging on dynamic GNN formulations.

The research pursued by this project will constitute a significant theoretical advance in the understanding of graph neural networks in unstructured contexts, so far a totally unexplored field. It will provide novel set of advanced methodological and operational tools for
hierarchical representation, segmentation, clustering and classification that will open the door to a new generation of high social impact
applications in various fields.

Project Publications

Journal Publications

  • A. Dhamanaskar, M. Dimiccoli, E. Corona, A. Pumarola and F. Moreno-Noguer. Enhancing egocentric 3D pose estimation with third person views . Pattern Recognition, 138(109358), 2023.

    Open/Close abstract Abstract Info Info pdf PDF
  • M. Dimiccoli and H. Wendt. Learning event representations for temporal segmentation of image sequences by dynamic graph embedding. IEEE Transactions on Image Processing, 30: 1476-1486, 2020.

    Open/Close abstract Abstract Info Info pdf PDF

Conference Publications

  • E.B. Bueno Benito, B. Tura and M. Dimiccoli. Leveraging triplet loss for unsupervised action segmentation, 2023 CVPR Workshop on Learning with Limited Labelled Data, 2023, Vancouver, Canadá, pp. 4922-4930, IEEE.

    Open/Close abstract Abstract Info Info pdf PDF
  • M. Dimiccoli, L. Garrido, G. Rodríguez and H. Wendt. Graph constrained data representation learning for human motion segmentation, 2021 International Conference on Computer Vision, 2021, Montreal, Canada, pp. 1440-1449.

    Open/Close abstract Abstract Info Info pdf PDF
  • M. Dimiccoli, H. Wendt and P. Batlle. Learning grounded word meaning representations on similarity graphs, 16th Conference on Empirical Methods in Natural Language Processing, 2021, Punta Cana, Dominican Republic, pp. 4760-4769.

    Open/Close abstract Abstract Info Info pdf PDF