James S. Magnuson, April 24

A time-invariant connectionist model of spoken word recognition


James S. Magnuson
Department of Psychology, University of Connecticut


Spoken words are unusual objects of perception. Unlike visual words, where all components of the object can be observed in parallel, a spoken word is an event made up of a series of sounds. I will review some of the significant computational challenges this presents: how can we model representational and processing mechanisms that could support spoken word recognition in psychologically plausible ways? Consider a word like "bib"; somehow, the system has to keep track of the fact that the sound /b/ has occurred two times, with a specific intervening sound, and the pattern should be mapped onto the word "bib". Keeping track of what sounds are heard when turns out to be extremely challenging, and most models of spoken word recognition have deferred this problem. One of the best-known examples of a model that have addressed this "temporal extent problem" is the TRACE model of speech perception and spoken word recognition (McClelland & Elman, 1986). TRACE tackles the temporal extent problem by turning the temporal problem into a spatial one: it creates a memory buffer where featural, phonemic, and lexical representations are reduplicated at short intervals. This allows TRACE to detect separate instances of /b/ in "bib" with independent /b/ detectors aligned with different instants of time. Because TRACE also posits rich interconnections among units, as the size of the lexicon and/or the memory trace is increased, there is an explosion in the number of processing units and connections the model requires. My colleagues and I have recently proposed a radically different approach to the temporal extent problem (Hannagan, Magnuson, & Grainger, submitted). Our inspiration comes from models of visual word recognition; while the letters making up visual words can be observed in parallel, they are nonetheless ordered. Hannagan & Grainger (in press) have proposed a model combining position-specific letter units that map to spatially-invariant digram and word units. Applying the same approach to spoken words – with temporally specific input units but temporally-invariant diphone and word units – results in a model that is able to account for a similar depth and breadth of spoken word recognition phenomena as the TRACE model. I will review how the model handles classic problems in spoken word recognition, how it compares to TRACE and other models, and close with a discussion of phenomena the model does not (yet) simulate, and novel predictions the model generates.

Where and when:
03:45-05:00 Apr 24, 2012
MPI Nijmegen, Rm 163

