Clippers 1/28: Cory Shain (Stanford) on language in the functional connectome of brains and models

Title: Language in the functional connectome of brains and models

Speaker: Cory Shain, Stanford Linguistics

Abstract: AI has turned into a complex systems science, much like neuroscience has always been. And increasingly, precision functional connectivity techniques in neuroscience are revealing that despite the daunting complexity of the human brain, there are natural “cuts” in the system, not just in terms of physiology, but in terms of cognitive function. In this talk, I will present recent work in the lab showing that one of those cuts is language. I will show evidence from an ongoing large scale neuroimaging study (1200 participants) that an unsupervised technique for parcellating each participant’s brain into networks reliably discovers a frontotemporal network of interconnected regions that is highly selective for language in that individual. This network is both closely adjacent to multiple functionally distinct networks within individuals and “loosely tethered” (Vázquez-Rodríguez et al, 2019) to anatomy. I will further show that, within the network, three putatively distinct linguistic processes (lexical semantics, syntax, and combinatorial semantics) distribute broadly, rather than localizing to different hubs. Together with a growing body of other research, these results suggest that language is “nearly decomposable” (Simon, 1962) as an integrated network in the brain. I will sketch how the lab is now pursuing the implications of this insight for neuroscience, its possible translations to neurosurgery and neural engineering, and its potential relevance to AI theory and practice.

Clippers 1/21: Vishal Sunder on Advancing End-to-End Speech AI with Knowledge Transfer

Title: Advancing End-to-End Speech AI with Knowledge Transfer

Abstract:

My thesis explores end-to-end (E2E) approaches to improve speech AI by addressing limitations of cascaded systems, such as ASR error propagation and large, misaligned models. The thesis focuses on three key tasks: speech understanding, speech assessment, and joint speech recognition and synthesis, leveraging knowledge transfer (KT) from auxiliary sources like large language models (LLMs), dialog history, and related tasks.

For speech understanding, E2E models integrate semantic knowledge from LLMs for tasks like intent extraction and slot filling using tokenwise contrastive pretraining (TCP). This approach is extended to the RNN transducer (RNN-T) model to enhance ASR and spoken language understanding (SLU). Differentiable cascading of ASR and SLU incorporates intermediate non-autoregressive objectives, improving intent classification and slot filling across datasets. Additionally, dialog history is incorporated through hierarchical and conformer-based conversation models, enhancing dialog act classification.

In speech assessment, two sub-problems are addressed: E2E disfluency detection/classification and real-time reading tracking for children. A hierarchical detection-classification (HiDeC) method mitigates class imbalance, while pointer-network models, trained on ASR alignment maps, track reading positions effectively.

For joint speech recognition and synthesis, a non-autoregressive multimodal framework processes speech and text inputs, independently or combined, and trains on unpaired datasets. Iterative refinement enhances performance, achieving competitive results in STT and TTS tasks.

These contributions advance robust E2E systems that are compact and resilient to ASR errors, bypassing cascaded approaches for efficient and effective speech AI.

Clippers 1/14: Christian Clark on Linear Recency Bias and Transformers’ Fit to Reading Times

Title:
Linear Recency Bias During Training Improves Transformers’ Fit to Reading Times

Abstract:
Recent psycholinguistic research has compared human reading times to surprisal estimates from language models to study the factors shaping human sentence processing difficulty. Previous studies have shown a strong fit between surprisal values from Transformers and reading times. However, standard Transformers work with a lossless representation of the entire previous linguistic context, unlike models of human language processing that include memory decay. To bridge this gap, this paper evaluates a modification of the Transformer model that uses ALiBi (Press et al., 2022), a recency bias added to attention scores. Surprisal estimates from a Transformer that includes ALiBi during training and inference show an improved fit to human reading times compared to a standard Transformer baseline. A subsequent analysis of attention heads suggests that ALiBi’s mixture of slopes—which determine the rate of memory decay in each attention head—may play a role in the improvement by helping models with ALiBi to track different kinds of linguistic dependencies.