Clippers Tuesday: Alex Erdmann on processing Arabic dialects

Tailoring “language agnostic” blackboxes to Arabic Dialects

Many state-of-the-art NLP technologies aspire to be language agnostic but perform disproportionately poorly on Arabic and its dialects. Identifying and understanding the linguistic phenomena which cause these performance drops and developing language specific solutions can shed light on how such technologies might be adapted to broaden their typological coverage. This talk will discuss several recent projects involving Arabic dialects which I worked on, including pan-dialectal dictionary induction, morphological modeling, and spelling normalization. For each of these projects, I will discuss the linguistic traits of Arabic that challenge language agnostic approaches, the language specific adaptations we employed to resolve such challenges, and finally, I will speculate on the generalizability of our solutions to other languages.