This paper studies the use of temporal consistency to match appear- ance descriptors and handle complex ambiguities when computing dynamic depth maps from stereo. Previous attempts have designed 3D descriptors over the space- time volume and have been mostly used for monocular action recognition, as they cannot deal with perspective changes. Our approach is based on a state-of-the-art 2D dense appearance descriptor which we extend in time by means of optical flow priors, and can be applied to wide-baseline stereo setups. The basic idea behind our approach is to capture the changes around a feature point in time instead of trying to describe the spatiotemporal volume. We demonstrate its effectiveness on very ambiguous synthetic video sequences with ground truth data, as well as real sequences.


stereo, spatiotemporal, appearance descriptors

E. Trulls Fortuny, A. Sanfeliu and F. Moreno-Noguer. Spatiotemporal descriptor for wide-baseline stereo reconstruction of non-rigid and ambiguous scenes, 12th European Conference on Computer Vision, 2012, Florence, Italy, in Computer Vision, Vol 7574 of Lecture Notes in Computer Science, pp. 441-454, 2012, Springer.