Research Project
GREAT: Beyond Graph Neural Networks: Joint graph topology learning and graph-based inference for computer vision
Type
National Project
Start Date
01/06/2020
End Date
31/05/2024
Project Code
PID2019-110977GA-I00
Staff
-
-
Gutiérrez, Marc
PhD Student
-
Bueno Benito, Elena Belén
PhD Student
-
Blázquez, Xabier
Master Student
-
Raza, Syed Riaz
Master Student
-
Hernández, Sergi
Support
-
Maté, Alberto
Support
-
Fatan, Mehdi
Member
Project Description
Project PID2019-110977GA-I00 funded by MCIN/ AEI /10.13039/501100011033
Graphs are an ubiquitous data structure, employed extensively within computer science and related fields, including pattern recognition, social networks, transportation networks, and biological systems, to name but a few. Recently, Graph Neural Networks (GNN) have emerged as a deep learning framework able to operate on graph domains to perform inference tasks in an end-to-end fashion. When the graph is explicit, GNN haven proven to be an extremely powerful modeling paradigm. However, in a variety of data domains, including point clouds, text corpora and untrimmed video, the graph structure underlying the data is unknown, and must be assumed or inferred.
Generally speaking, inferring graph topologies from observations is an ill-posed problem, and there are many ways of associating a topology with the observed data samples. Modern approaches for graph topology inference adopt a Graph Signal Processing (GSP) perspective, which explicitly models certain properties of the graph signals (e.g., smoothness, sparsity). While this emphasizes the relation between graph topology and the associated graph signals, the approach is tied to strong a priori assumptions on the signals.
The goal of the current project is the theoretical and computational investigation of models, methods and algorithms for defining a novel framework that enables to learn graph topology jointly with an inference task on a graph in a data-driven, end-to-end formulation, hence combining the strengths of GNN and GSP.
The proposed methodological developments will be put to test on real-world data in the challenging computer vision tasks of temporal event/action segmentation and event/action localization. This is motivated by recent neuroscientific findings showing that neural event representations in humans arise form temporal community structures akin to graphs. The proposed solutions are however not limited to this context and may contribute to many other areas of application that share similar challenges, e.g., anomaly detection, change detection, or image and motion segmentation.
The work plan of the project is structured around the following specific objectives:
1. Exploring the use of different techniques for graph learning regularization, with special emphasis on nonlocal methods to reveal complex
long-range data inter-dependencies.
2. Modeling the joint learning of graph topology and graph embedding in an unsupervised fashion.
3. Modeling end-to-end graph topology learning and clustering by absorbing application-driven priors in the learning problem.
4. Modeling end-to-end graph topology learning and weakly supervised node classification, leveraging on dynamic GNN formulations.
The research pursued by this project will constitute a significant theoretical advance in the understanding of graph neural networks in unstructured contexts, so far a totally unexplored field. It will provide novel set of advanced methodological and operational tools for
hierarchical representation, segmentation, clustering and classification that will open the door to a new generation of high social impact
applications in various fields.
Project Publications
Journal Publications
-
A. Dhamanaskar, M. Dimiccoli, E. Corona, A. Pumarola and F. Moreno-Noguer. Enhancing egocentric 3D pose estimation with third person views . Pattern Recognition, 138(109358), 2023.
Abstract Info PDF
-
M. Dimiccoli and H. Wendt. Learning event representations for temporal segmentation of image sequences by dynamic graph embedding. IEEE Transactions on Image Processing, 30: 1476-1486, 2020.
Abstract Info PDF
Conference Publications
-
M. Fatan, E. Mincato, D. Pintzou and M. Dimiccoli. 3M-Transformer: A multi-stage multi-stream multimodal transformer for embodied turn-taking prediction, 2024 IEEE International Conference on Acoustics, Speech and Signal Processing, 2024, Seoul, Korea, pp. 8050-8054.
Abstract Info PDF
-
E.B. Bueno Benito, B. Tura and M. Dimiccoli. Leveraging triplet loss for unsupervised action segmentation, 2023 CVPR Workshop on Learning with Limited Labelled Data, 2023, Vancouver, Canadá, pp. 4922-4930, IEEE.
Abstract Info PDF
-
M. Dimiccoli, L. Garrido, G. Rodríguez and H. Wendt. Graph constrained data representation learning for human motion segmentation, 2021 International Conference on Computer Vision, 2021, Montreal, Canada, pp. 1440-1449.
Abstract Info PDF
-
M. Dimiccoli, H. Wendt and P. Batlle. Learning grounded word meaning representations on similarity graphs, 16th Conference on Empirical Methods in Natural Language Processing, 2021, Punta Cana, Dominican Republic, pp. 4760-4769.
Abstract Info PDF
Follow us!