Clippers 2/22: Symon Stevens-Guille and Sandro Maskharashvili on Regenerating Discourse Connectives in the PDTB

Recent work in natural language generation has seen an increase in end-to- end neural network model usage. We report on ongoing work exploring how well these models can generate discourse that is coherent while still preserving the content of the input. We exemplify this work with results on the generation of discourses by the widely used model BART, which we fine-tune on texts reconstructed from the Penn Discourse Tree Bank. These texts are structured by explicit and implicit discourse connectives, e.g. ‘but’, ‘while’, ‘however’. We show that encoding in the input the discourse relation to be expressed by the connective, e.g. ‘Contingency Cause Result’, improves how well the model expresses the intended discourse relation, including whether the connective is implicit or explicit. Metrics inspired by psycholinguistic results are discussed.