Title: A typology of ambiguity in medical concept normalization datasets
Medical concept normalization (MCN; also called biomedical word sense disambiguation) is the task of assigning unique concept identifiers (CUIs) to mentions of biomedical concepts. Several MCN datasets focusing on Electronic Health Record (EHR) data have been developed over the past decade, and while several challenges due to conceptual ambiguity have been identified in methodological research, the types of lexical ambiguity exhibited by clinical MCN datasets has not been systematically studied. I will present preliminary results of an ongoing analysis of benchmark clinical MCN datasets, describing an initial, domain-specific typology of lexical ambiguity in MCN annotations. I will also discuss desiderata for future MCN research aimed at addressing these challenges in both methods and evaluation.