From learning a language to riding a bike, most of our experiences are multisensory in nature. The fact that the brain can integrate information from different sensory modalities into a coherent and unitary experience is truly amazing given that each modality simultaneously receives qualitatively different types of input (e.g., photons, molecules, pressure, etc.) and this information is processed, at least in the early stages of processing, by dedicated sensory systems. My program of research examines how infants, children, and adults process and integrate multisensory information and how this ability sub-serves various cognitive tasks such as statistical learning, categorization, word learning, and individuation. Some of the questions guiding my research are: (a) How do people allocate attention to multisensory stimuli, (b) Do sensory modalities have dedicated attentional resources which allows for multisensory stimuli to be processed in parallel or do sensory modalities compete for the same pool of resources, and (c) Do attentional weights assigned to sensory modalities change across development and change in the course of processing? My research consists of two interrelated parts, one examining the mechanisms underlying cross-modal processing and another attempting to ground many sophisticated behaviors in the dynamics of cross-modal processing. I take a lifespan approach, so my research includes infants and toddlers (8- to 24-months), children (3- to 5-years), and adults, and I have recently proposed a set of studies examining cross-modal processing in elderly populations (see Future Directions section). While most of my research is behavioral in nature, I also try to incorporate psychophysiological measures such as Heart Rate and Event Related Potentials (ERP) into my research studies.
Mechanisms Underlying Cross-Modal Processing
There are many reasons to believe the brain is optimized for integrating across sensory modalities. For example, a sound can help disambiguate the ambiguous spatial position of an object (Richardson & Kirkham, 2004), visual cues such as lip movements can disambiguate speech in a noisy environment (Hollich, Newman, & Jusczyk, 2005; Sumby & Pollack, 1954), and presenting information to multiple sensory modalities can often facilitate learning of amodal relations such as rate, tempo, or rhythm (Bahrick, Flom, & Lickliter, 2002). However, there are also many situations when modalities convey different information or even provide conflicting information. For example, in word and category learning tasks, auditorily presented words are often arbitrarily paired with visual objects and visual categories. Thus, prior to learning the arbitrary word-object pairings, words initially provide no information about the appearance of an object. My research examining processing of arbitrary sound-object and word-object pairings demonstrates that presenting information to multiple sensory modalities often results in asymmetric costs: multisensory presentation often attenuates processing in one modality (compared to a unimodal baseline) while having no negative effect on processing in the second modality. In addition, this asymmetry is often biased in favor of the auditory modality (i.e., auditory dominance), with auditory input disrupting visual processing and visual input having no negative effect on auditory processing (Robinson & Sloutsky, 2010a, in press).
To account for this asymmetry, I have argued that sensory modalities share the same pool of attentional resources, and that attention is serially allocated to the auditory and visual modalities (see Robinson, Best, Weng, & Sloutsky, 2012; Robinson & Sloutsky, 2010b for reviews). Furthermore, because auditory stimuli are often dynamic and transient in nature, it seems adaptive to first allocate attention to auditory stimuli before processing the details of stimuli that are presented for prolonged periods of time (e.g., visual stimuli). Thus, I believe this asymmetry is driven by the serial nature of multisensory processing. For example, as can be seen in the top two figures on the next page, when stimuli are presented unimodally, encoding of the details of auditory and visual stimuli occurs early in the course of processing. However, when auditory and visual stimuli are presented simultaneously (see bottom figure), the auditory stimulus quickly engages attention, and the latency of encoding the auditory stimulus is comparable to the unimodal condition (i.e., no cost on auditory processing). In contrast, encoding the details of the visual stimulus does not begin until the auditory modality releases attention. Furthermore, auditory stimuli that are slow to release attention (e.g., unfamiliar and/or complex stimuli) should attenuate or delay visual processing more than auditory stimuli that are quick to release attention (e.g., familiar and/or simple stimuli). Finally, auditory dominance effects should be more pronounced in infants and young children because they are typically slower at processing information than adults (e.g., Kail, & Salthouse, 1994). Thus, early in development, it should take longer for the auditory modality to release attention.
The current approach makes many novel predictions regarding how attention is allocated to multisensory stimuli. For example, the current approach predicted that simultaneously presenting auditory and visual information should attenuate (or delay) visual processing (e.g., Robinson & Sloutsky, 2004, 2007a, 2010a, in press; Sloutsky & Robinson, 2008), while having no cost on auditory processing (Robinson & Sloutsky, 2010a, in press). For example, in Robinson and Sloutsky (2010a), infants were habituated to an auditory-visual compound stimulus. After being habituated to the compound stimulus, infants increased looking at test when the auditory stimulus changed but they did not increase looking when the visual stimulus changed. This finding is noteworthy given that infants ably discriminated the same visual stimuli when presented unimodally, and there was no evidence that visual stimuli attenuated discrimination of auditory stimuli (compared to a unimodal baseline). Therefore, it was concluded that the auditory stimulus overshadowed the visual stimulus (see Robinson & Sloutsky, 2004, 2007b, Sloutsky & Robinson, 2008 for similar findings using a variety of tasks).
Additional evidence for this processing asymmetry comes from studies examining Heart Rate and ERP responses to unimodal and cross-modal information (Robinson, Ahmar, & Sloutsky, 2010, under review; Robinson & Sloutsky, 2010c, in prep). For example, Robinson, Ahmar, and Sloutsky (2010, under review) presented adults with unimodal and cross-modal oddball tasks, and ERPs were recorded as adults either passively or actively observed/responded to infrequent auditory and visual stimuli. In these studies we were primarily interested in how quickly the brain responded to infrequent stimuli (i.e., oddballs) when these stimuli were presented in isolation (i.e., unimodal condition) and when the same stimuli were presented multimodally. Examination of P300, a signature pattern of oddball detection (Sutton, Braren, Zubin, & John, 1965), revealed that simultaneously presenting auditory and visual input increased the latency of detecting visual oddballs, while having no negative effect on the latency of detecting auditory oddballs. This finding is remarkable given that approximately 40 years of research examining adults’ explicit responses demonstrates that competition between modalities is typically won by the visual modality (Colavita, 1974; Colavita & Weisberg, 1979; Klein, 1977; Posner, Nissen, & Klein, 1976; see also Sinnett, Spence, & Soto-Faraco, 2007; Spence, Shore, & Klein, 2001, for reviews). To follow up on the ERP findings, I manipulated the response component, requiring adults to either make the same response to auditory and visual oddballs or make separate responses to auditory and visual information (Chandra, Robinson, and Sinnett, 2011). In the former condition it is impossible to develop a modality-specific response bias because auditory and visual stimuli are associated with the same response; whereas, in the latter condition, adults can bias their responding in favor of one modality. The single response task replicated ERP findings showing evidence of auditory dominance; whereas, the two button task replicated visual dominance research. While future research is needed, I believe these findings suggest that intersensory competition occurs at various points in the course of processing with auditory input dominating processing and visual input dominating the response.
Effects of Auditory Input on Higher-Order Tasks
The current approach also makes interesting predictions concerning the effect of words and sounds on higher-order tasks such as statistical learning, word learning, categorization, and individuation. For example, it is well documented that words and sounds often have different effects on these tasks, with infants being more likely to learn categories when visual stimuli are paired with words than when the same images are paired with sounds (Fulkerson & Waxman, 2007; see also Xu, 2002). While it is often argued that these effects stem from young infants understanding the conceptual importance of words (i.e., words but not sounds are symbols that denote categories), my approach predicts that this effect stems from unfamiliar sounds attenuating visual processing more than familiar sounds and human speech. To distinguish between these two accounts, I compared infants’ abilities to categorize (Robinson & Sloutsky, 2007b) and individuate objects (Robinson & Sloutsky, 2008) when images were paired with words and sounds to a unimodal visual baseline (i.e., visual images were not paired with any auditory stimulus). Consistent with previous research, words and sounds had different effects in these tasks; however, there was no evidence that the word condition exceeded the silent baseline. Rather, both words and sounds attenuated categorization and individuation (Robinson & Sloutsky, 2007b, 2008), especially early in the course of processing (Robinson & Sloutsky, 2008; Experiment 2).
To get a better understanding of the mechanism(s) underlying categorization and effects of linguistic labels on categorization, my colleagues and I put together a series of eye tracking experiments to monitor moment to moment changes in visual fixations while infants, children, and adults are learning novel categories (Best, Robinson, & Sloutsky, 2011a, 2011b, under review). These studies are important because they focus on both the outcome and the process of learning. Consistent with previous research in adults (Blair, Watson, & Meier, 2009; Rehder & Hoffman, 2005), adults optimized their attention after learning the novel categories: they selectively focused on features that defined the category and decreased their attention to category irrelevant features (Best, Robinson, & Sloutsky, under review). In contrast, 4-year-olds showed a different pattern. While children also learned the visual categories, there was no evidence that they selectively attended to the relevant information. These findings are consistent with the view that there are multiple mechanisms underlying categorization, with an early developing “compression-based” system which abstracts redundancy from the input and a developmentally protracted “selection-based” system which requires selective attention and development of the prefrontal cortex (Ashby, Alfonso-Reese, Turken, & Waldron, 1998; Sloutsky, 2010). Eye tracking data and explicit responses also provide support for cross-modal interference. In particular, infants and children familiarized to visual categories in silence were more likely to learn the categories and they also accumulated more looking to category relevant features than infants who heard speech (Best, Robinson, & Sloutsky, under review).
To examine how sensory modalities interact while implicitly learning the statistical structure in the input, Robinson and Sloutsky (in press) presented adults with a cross-modal statistical learning task. In this study, adults were trained on a sequence of auditory and/or visual stimuli, which were either presented unimodally or cross-modally. After training, they were presented with a short unimodal sequence, and they had to determine if the sequence was familiar (i.e., presented during training) or novel. While adults ably learned auditory and visual statistics when AV sequences were presented unimodally or when they were correlated during training, increasing task demands by breaking the correlation between the AV sequences resulted in an important asymmetry. In particular, increased task demands attenuated learning of the visual sequences; however, it had no effect on learning of the auditory sequences. The current statistical learning study and recent ERP findings (Robinson, Ahmar, and Sloutsky, 2010, under review) examined encoding and learning of multimodal information, and these studies both found novel evidence indicating that processing of auditory input is more robust than processing of visual input, with auditory input delaying or attenuating encoding of visual input.
Future Directions
One of my goals is to develop a better understanding of the dynamics of cross-modal processing across the entire lifespan, with an emphasis on the later years of development. Currently, very little is known about processing of multisensory information in elderly populations. Furthermore, the mechanisms underlying cross-modal facilitation and interference effects are debated, and the developmental trajectory is not clear due to differences in tasks and measures across development. In collaboration with Scott Sinnett from the University of Hawaii, we recently outlined a set of studies that will examine the factors accounting for auditory and visual dominance, as well as cross-modal facilitation effects (where presenting information to multiple sensory modalities facilitates performance compared to the unimodal conditions). The project will: (a) examine potential mechanisms underlying cross-modal facilitation and interference effects, (b) shed light on how people detect, discriminate, and respond to cross-modal information, and (c) highlight how these abilities change from early infancy to late adulthood. I am very interested in reaching out to elderly populations, as little is known about cross-modal processing in this group. Furthermore, I believe that many of these individuals would be interested in participating in research and learning about research examining cognitive abilities in older populations, especially research showing plasticity into late adulthood.
Second, while my previous research begins to shed light on the time course of cross-modal processing, it does not provide insight into what dimensions of the auditory and visual stimuli participants are relying on when they are performing these tasks. In future studies I plan to systematically manipulate characteristics of auditory and visual information (e.g., absolute/relative pitch, timbre, hue, brightness, etc.), to determine how attention is distributed within a sensory modality. These manipulations will not only quantify attention weights to the various dimensions within a modality but it will also be important for understanding differences between intra-modality competition (e.g., pitch and timbre provide conflicting information) and inter-modality competition (e.g., pitch and hue provide conflicting information). Furthermore, there is much evidence showing that even young infants treat words differently than other types of non-linguistic sounds. Manipulating the properties within speech and non-speech input could provide some important insights into what dimensions are important for making these judgments.
Finally, I would like to further examine the effects of linguistic input on category learning and mechanisms underlying categorization more generally. It is often assumed that words facilitate categorization by directing attention to category relevant features or dimensions (Fulkerson & Waxman, 2007). However, in several yet unpublished eye tracking studies, I found no evidence that infants or children who heard a common word associated with different exemplars accumulated more looking to category relevant features (Best, Robinson, & Sloutsky, under review; Robinson & Sloutsky, under review-a). Therefore, it is possible that some of the early effects of words on category learning (when found) stem from words attenuating visual processing. Attenuated visual processing could explain how children embedded in a perceptually-rich environment quickly disregard the fine details of a referent in favor of a more abstract, generic representation (e.g., overall shape, whole object, etc.). From this perspective, labeling objects may not only affect category learning through a honing-in process (e.g., adults increase looking to category-relevant features) but also through a pruning process in which many of the fine details of a stimulus are not encoded, thus, leaving a generic representation of the referent. While honing-in and pruning processes make similar predictions on category learning tasks, only the latter account predicts that words facilitate category-like responding by attenuating recognition of individual items. In addition to examining possible mechanisms underlying effects of labels on categorization, I would also like to further examine mechanisms/models of adult categorization. In particular, there is a growing debate on whether there is a single system or multiple systems underlying categorization. My preliminary research with children and adults is consistent with the multiple systems approach, with children and adults showing different attentional patterns while learning novel categories (Best, Robinson, & Sloutsky, under review). One interesting component of this research is that it is possible to turn adults into children by experimentally suppressing hypothesis testing/selective attention processes. For example, adults who participated in a word repetition task while presented with visual images ably learned the visual categories but did not increase looking to category-relevant features (similar to younger participants). This recent finding suggests that some categories can be learned without adjusting attention weights to category-relevant features (i.e., without selective attention). I would like to continue examining this issue in the future because it has direct implications for models of categorization, as most models include selective attention as an important component in category learning.
References
Ashby, F. G., Alfonso-Reese, L. A., Turken, A. U., & Waldron, E. M. (1998). Neuropsychological theory of multiple systems in category learning. Psychological Review, 105, 442-481.
Bahrick, L. E., Flom, R., & Lickliter, R. (2002). Intersensory redundancy facilitates discrimination of tempo in 3-month-old infants. Developmental Psychobiology, 41, 352–363.
Best, C. A., Robinson, C. W., & Sloutsky, V. M. (2011a). The effect of labels on categorization: Is attention to relevant features a good index of infants’ category learning? In L. Carlson, C. Hölscher, & T. Shipley (Eds.), Proceedings of the 33rd Annual Conference of the Cognitive Science Society (pp. 2751-2755). Austin, TX: Cognitive Science Society.
Best, C. A., Robinson, C. W., & Sloutsky, V. M. (2011). The effect of labels on children’s category learning. In L. Carlson, C. Hölscher, & T. Shipley (Eds.), Proceedings of the 33rd Annual Conference of the Cognitive Science Society (pp. 3332-3336). Austin, TX: Cognitive Science Society.
Best, C. A., Robinson, C. W., & Sloutsky, V. M. (under review). Developmental changes in the role of selective attention in category learning: Evidence from eye tracking.
Blair, M. R., Watson, M. R., & Meier, K. M. (2009). Errors, efficiency, and the interplay between attention and category learning. Cognition, 112(2), 330-336.
Chandra, M., Robinson, C. W., & Sinnett, S. (2011). Coexistence of multiple modal dominances. In L. Carlson, C. Hölscher, & T. Shipley (Eds.), Proceedings of the 33rd Annual Conference of the Cognitive Science Society (pp. 2604-2609). Austin, TX: Cognitive Science Society.
Colavita, F. B. (1974). Human sensory dominance. Perception & Psychophysics, 16, 409-412.
Colavita, F. B., & Weisberg, D. (1979). A further investigation of visual dominance. Perception & Psychophysics, 25, 345–347.
Fulkerson, A.L., & Waxman, S.R. (2007). Words (but not tones) facilitate object categorization: Evidence from 6- and 12-month-olds. Cognition, 105, 218-228.
Hollich, G., Newman, R., & Jusczyk, P. (2005). Infants’ use of synchronized visual information to separate streams of speech. Child Development, 76, 598-613.
Klein, R. M. (1977). Attention and visual dominance: A chronometric analysis. Journal of Experimental Psychology: Human Perception & Performance, 3, 365-378.
Posner, M. I., Nissen, M. J., & Klein, R. M. (1976). Visual dominance: An information-processing account of its origins and significance. Psychological Review, 83, 157-171.
Rehder, B., & Hoffman, A. B. (2005). Eye tracking and selective attention in category learning. Cognitive Psychology, 51, 1-41.
Richardson, D. C., & Kirkham, N. Z. (2004). Spatial indexing in adults and six month olds: evidence from eye tracking. Journal of Experimental Psychology: General, 133, 46-62.
Robinson, C. W., Ahmar, N., & Sloutsky, V. M. (2010). Evidence for auditory dominance in a passive oddball task. In S. Ohlsson & R. Catrambone (Eds.), Proceedings of the 32nd Annual Conference of the Cognitive Science Society (pp 2644-2649). Austin, TX: Cognitive Science Society.
Robinson, C. W., Ahmar, N., & Sloutsky, V. M. (under review). Sounds interfere with visual processing: Neurophysiological evidence for auditory dominance.
Robinson, C. W., Best, C. A., Weng, D., & Sloutsky, V. M. (2012). The Role of Words in Cognitive Tasks: What, When, and How?. Frontiers in Psychology, 3, 1-8.
Robinson, C.W., & Sloutsky, V.M. (2004). Auditory dominance and its change in the course of development. Child Development, 75, 1387-1401.
Robinson, C.W., & Sloutsky, V.M. (2007a). Visual processing speed: Effects of auditory input on visual processing. Developmental Science, 10, 734-740.
Robinson, C.W., & Sloutsky, V.M. (2007b). Linguistic labels and categorization in infancy: Do labels facilitate or hinder?. Infancy, 11, 233-253.
Robinson, C. W., & Sloutsky, V. M. (2008). Effects of auditory input in individuation tasks. Developmental Science, 11, 869-881.
Robinson, C. W., & Sloutsky, V. M. (2010a). Effects of multimodal presentation and stimulus familiarity on auditory and visual processing. Journal of Experimental Child Psychology, 107, 351-358.
Robinson, C. W., & Sloutsky, V. M. (2010b). Development of Cross-modal Processing. Wiley Interdisciplinary Reviews: Cognitive Science, 1, 135-141.
Robinson, C. W., & Sloutsky, V. M. (2010c). Attention and cross-modal processing: Evidence from heart rate analyses. In S. Ohlsson & R. Catrambone (Eds.), Proceedings of the 32nd Annual Conference of the Cognitive Science Society (pp 2639-2643). Austin, TX: Cognitive Science Society.
Robinson, C. W. & Sloutsky, V. M. (in press). When audition dominates vision: Evidence from cross-modal statistical learning. Experimental Psychology.
Robinson, C. W. & Sloutsky, V. M. (in prep). Behavioral and psychophysiological responses to cross-modal information.
Sinnett, S., Spence, C., & Soto-Faraco, S. (2007). Visual dominance and attention: Revisiting the Colavita effect. Perception & Psychophysics, 69, 673–686.
Sloutsky, V. M. (2010). From perceptual categories to concepts: What develops? Cognitive Science, 34, 1244-1286.
Sloutsky, V. M., & Robinson, C. W. (2008). The role of words and sounds in visual processing: From overshadowing to attentional tuning. Cognitive Science, 32, 354-377.
Spence, C., Shore, D. I., & Klein, R. M. (2001). Multisensory prior entry. Journal of Experimental Psychology: General, 130, 799-832.
Sumby, W.H., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America, 26, 212-215.
Sutton, S., Braren, M., Zubin, J., John, E. (1965). Evoked potential correlates of stimulus uncertainty. Science, 150, 1187–1188.
Waxman, S. R. (2003). Links between object categorization and naming: origins and emergence in human infants. In D.H. Rakison & L.M. Oakes (Eds.), Early category and concept development: Making sense of the blooming, buzzing confusion (pp. 213–241). London: Oxford University Press.
Xu, F. (2002). The role of language in acquiring object kind concepts in infancy. Cognition, 85, 223–250.