Explicitly Incorporating Tense/Aspect to Facilitate Creation of New Virtual Patients
The Virtual Patient project has collected a fair amount of data from student interactions with a patient presenting with back pain, but there is a desire to include a more diverse array of patients. With adequate training examples, treating the question identification task as a single label classification problem has been fairly successful. However, the current approach is not expected to work well to identify the novel questions that are important for patients with different circumstances, because these new questions have little training support. Exploring the label sets reveals some generalities across patients, including the importance of temporal properties of the symptoms. Including temporal information in the canonical question representations may allow us to leverage external data to mitigate the data sparsity issue for questions unique to new patients. I will solicit feedback on an approach to create a frame-like question representation that incorporates this temporal information, as revealed by the tense and linguistic aspect of clauses in the queries.