Symbolic Regression in Data Science

Work default illustration


  • If you are interested in the proposal, please contact with the supervisors.


This work can be developed as an undergraduate project. It is not required to be done as a TFG.

Symbolic regression algorithms are distinct from deep neural networks, the famous artificial intelligence algorithms which provide opaque systems (i.e., systems where it is difficult to interpret the reason behind the generated outputs). Symbolic regression identifies relationships in complicated data sets, but it reports the findings in a format human researchers can understand: a short equation. These algorithms resemble supercharged versions of Excel’s curve-fitting function, except they look not just for lines or parabolas to fit a set of data points, but billions of formulas of all sorts.

The objective of this project is to explore the different existing methods implementing symbolic regression and to select the most adequate ones for different real-world datasets. In some cases, we will need to adapt the existing methods to the particularities of the data sets. For instance, in kinematics the efficient execution of the resulting equation is fundamental. This is an aspect not usually considered in symbolic regression.

To achieve the objective of this project we will follow these steps

1. Analysis of the existing symbolic regression methods.
2. Analysis of the available datasets which mainly refer to kinematic problems, but also to problems in architecture.
3. Selection of the best method for the available datasets.
4. Adaptation of the selected methods to the particularities of our problems.
5. Application of the adapted methods to the different data sets.
6. Documentation of the results.