Online reinforcement learning using a probability density estimation

Journal Article (2017)


Neural Computation







Doc link


Download the digital copy of the doc pdf document



Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and non-stationarity. In this kind of tasks, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The non-stationarity comes from the recursive nature of the estimations typical of temporal difference methods. This non-stationarity has a local profile, not only varying along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a Gaussian mixture model. To deal with the non-stationarity problem we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it to depend on the local density of samples, which we use to estimate the non-stationarity of the function at any given input point. On the other hand, to address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting only depending on time, thus avoiding undesired distortions of the approximation in less sampled regions.


dynamic programming, learning (artificial intelligence).

Author keywords

online reinforcement learning, Gaussian mixture models

Scientific reference

A. Agostini and E. Celaya. Online reinforcement learning using a probability density estimation. Neural Computation, 29(1): 220-246, 2017.