Virtual patients are an effective, cost-efficient tool for training medical professionals to interview patients in a standardized environment. Technological limitations have thus far limited these tools to typewritten interactions; however, as speech recognition systems have improved, full-scale deployment of a spoken dialogue system for this purpose has edged into the range of feasibility. To build the best such system possible, we propose to take advantage of work done to improve the functioning of virtual patients in the typewritten domain. Specifically, our approach is to noisily map spoken utterances into text using off-the-shelf speech recognition, whereupon the text can be used to train existing question classification architectures. We expect that phoneme-based CNNs may mitigate recognition errors in the same way that character-based CNNs mitigate e.g., spelling errors in the typewritten domain. In this talk I will present the architecture of the system being developed to collect speech data, the experimental design, and some baseline results.