Convolutional neural network-based encoding and decoding of visual object recognition in space and time
Seeliger, K., Fritsche, M., Güçlü, U., Schoenmakers, S., Schoffelen, J.-M., Bosch, S. E., & Van Gerven, M. A. J.
Convolutional neural network-based encoding and decoding of visual object recognition in space and time. NeuroImage, 180
, 253-266. doi:10.1016/j.neuroimage.2017.07.018.
Representations learned by deep convolutional neural networks (CNNs) for object recognition are a widely
investigated model of the processing hierarchy in the human visual system. Using functional magnetic resonance
imaging, CNN representations of visual stimuli have previously been shown to correspond to processing stages in
the ventral and dorsal streams of the visual system. Whether this correspondence between models and brain
signals also holds for activity acquired at high temporal resolution has been explored less exhaustively. Here, we
addressed this question by combining CNN-based encoding models with magnetoencephalography (MEG).
Human participants passively viewed 1,000 images of objects while MEG signals were acquired. We modelled
their high temporal resolution source-reconstructed cortical activity with CNNs, and observed a feed-forward
sweep across the visual hierarchy between 75 and 200 ms after stimulus onset. This spatiotemporal cascade
was captured by the network layer representations, where the increasingly abstract stimulus representation in the
hierarchical network model was reflected in different parts of the visual cortex, following the visual ventral
stream. We further validated the accuracy of our encoding model by decoding stimulus identity in a left-out
validation set of viewed objects, achieving state-of-the-art decoding accuracy.