Clippers 1/31: David Palzer on N-Pathic Speaker Diarization

Title: N-Pathic Speaker Diarization
Abstract: Speaker diarization is mainly studied through clustering speaker embeddings. However, the clustering approach has two major limitations: it doesn’t minimize diarization errors and can’t handle speaker overlaps. To address these problems, End-to-End Neural Diarization (EEND) was introduced. The Encoder-Decoder-Attractor (EDA) was also proposed for recordings with unknown speaker count. In this paper, we present two improvements: (1) N-Pathic, a base model that uses chunked data to reduce attention mechanism length and memory usage, and (2) an improved EDA architecture with increased data efficiency through non-sequence-dependant modules. Our proposed method was evaluated on simulated mixtures, real telephone calls, and real dialogue recordings.

Clippers 1/24: Sandro Maskharashvili on Discourse Relations in NLG

Discourse Relations: Their Role and Use in Natural Language Generation

Abstract
Speakers make extensive use of discourse connectives (e.g., but, and, so, although etc.) while communicating messages with rich information: Discourse connectives express abstract relations, called discourse relations, between pieces of information they connect. This facilitates understanding the message the speaker wants to communicate. Traditional computational linguistic (CL) approaches to natural language processing heavily rely on modeling discourse relations, in both natural language generation (NLG) and parsing tasks. The recent emergence of neural network-based approaches to natural language modeling led to remarkable advances in many CL tasks, including NLG. Nevertheless, when it comes to discourse-level phenomena, particularly the coherent use of discourse connectives, improvements are less obvious. First, I will present results of my doctoral research concerning design of symbolic, grammatical approaches to discourse, which are in line with the traditional CL approaches to discourse but overcome some important obstacles that previous approaches have. Then, I will review studies we have been systematically carrying out to establish whether neural network-based approaches can be extended/revised to overcome the issues they face. Based on our results, I will argue that reinstating the central, ubiquitous status of discourse relations by explicitly encoding discourse relations in natural language meaning representations, significantly enhances correct and coherent generation of discourse connectives with neural network-based approaches. Finally, I will discuss ample possibilities of exploring synergies of traditional, grammatical approaches and the state-of-the-art neural network-based ones to overcome critical issues, such as, data limitation problems for low-resourced languages, and interpretability of the performance of neural-network based models of language.