(1) CAREER: Towards Interactive and Transparent Question Answering with Applications in the Clinical Domain:
(Funded by US National Science Foundation, ~500K, 06/2020-05/2025 (estimated); Sole PI: Huan Sun)
Abstract: Finding relevant information quickly is integral to effective and efficient decision making. This becomes increasingly difficult as the scale and heterogeneity of data continue to grow rapidly. Question answering (QA) systems, which aim to find precise answers to natural language questions from users, have shown great potential to address this problem. However, state-of-the-art QA systems still largely fall short in the following scenarios: (1) when questions are ambiguous and/or complex (e.g., involving multiple relations and operators), (2) when answering questions requires background knowledge that is not readily available in the data, and (3) when users need to understand the system’s answering process in order to better judge its trustworthiness. Such scenarios are prevalent in real application domains of QA (such as healthcare, finance, and sciences), and must be addressed in building practical systems. This project aims to develop a new QA model that can interact with users to resolve ambiguity and uncertainty during the answering process, and can tackle challenging problems such as identifying when requesting feedback from the user is necessary while achieving the optimal trade-off between answer quality and interaction cost. The project further aims to improve the QA model’s transparency by decomposing a complex question into several intermediate sub-questions and allowing users to validate them. The expected results can thus contribute to future human-technology partnership by enabling QA models to be more interactive, more transparent, and hence more trustworthy. The proposed QA model will be tested in a clinical domain, where doctors often ask questions about a patient and look for answers from his/her clinical notes in Electronic Medical Records (EMRs). Such a QA model can enable doctors to effectively and efficiently query EMRs and gather relevant evidence for critical decision making. The project plans to engage high school students and undergraduates, especially from underrepresented groups, and prepare them for future education and employment opportunities.
(2) Towards Resolving Ad-hoc Concept Queries with Table Answers via Multi-source Data Mining:
(Funded by US National Science Foundation, $499K; Sole PI: Huan Sun)
Users often issue queries about certain concepts to gather information and make decisions. The concepts concerned in such queries are usually ad-hoc and less likely to be directly covered in a predefined schema. An ideal response to the query would be a table with entities belonging to the queried concept as the rows and relevant attributes being the columns. However, in most cases, no such tables are readily available, and users have to collect relevant information by themselves, which is a painful process, especially when users are exploring unfamiliar concepts and do not know what information is critical for their decision making. This project builds the first-of-its-kind framework to resolve ad-hoc concept queries with table answers to save users tremendous efforts on information gathering.
The framework to be developed mines multiple complementary data sources including knowledge bases, texts, and tables, and proposes systematic methods to combine relevant knowledge to solve a specific query. This project has the potential to build a practical and transformative question answering system, by focusing on realistic ad-hoc queries rather than simple encyclopedic questions. It can further guide the construction of specialized question answering systems in various domains including medical, social, education, and management. It will open up a series of work on combining multiple sources based on the query need and in an ad-hoc manner for many other tasks. This is critical in the big data age, when many domains like healthcare have emerging complementary data sources such as texts, databases, tables, and human networks. The project will actively participate in outreach education programs, e.g., those to host underrepresented high school students as summer interns.
(3) Advancing Human and Machine Question Answering via Human-Machine Collaboration:
(Funded by Army Research Office, $499K; Sole PI: Huan Sun)
There are two main channels for question answering (QA): machines and humans. Both QA channels have made remarkable advancement recently and are now pervasive in our daily life. However, they still have their respective limitations. In this proposal, we observe that the respective limitations of cutting-edge human and machine QA systems can be largely remedied by the other, and develop novel human-machine collaboration mechanisms to combine human and machine intelligence for question answering.
(4) Unlocking Clinical Text in EMR by Query Refinement Using Both Knowledge Bases and Word Embedding:
(Funded by Patient-Centered Outcomes Research Institute; $1060K, PI: Simon Lin, Huan Sun)
Up to 80 percent of the information in electronic medical records (EMRs) is largely inaccessible because it is contained within clinical narratives. These texts document patient concerns, rationales of the clinical decisions, patient-clinician interactions, and other patient-centered information for real-life healthcare cases. They can be extremely useful for improving the processes of clinical decision making, which benefits both clinicians and patients. They also provide highly relevant data and evidence for research planning for patient-centered outcomes and comparative effectiveness research (PCOR/CER) studies. Just like the power of Google-like search engines, allowing researchers and other PCOR/CER stakeholders to directly interact with EMR texts based on their own interests is invaluable. In particular, the project team will develop a novel framework to generate clinical query refinements and categorize them based on their predicted relationships to the original user query, by employing two techniques, word embedding and knowledge bases, to fill the methodology gap. In addition, the project team will implement the new framework with an interactive interface to create a new and powerful EMR text search engine called QREK (i.e., Query Refinement with word Embedding and Knowledge Bases). Existing EMR text search engines usually automatically extend user queries using knowledge bases in order to enhance search performance. In comparison, QREK can interactively address the vocabulary issue by suggesting related phrases mined from EMR texts, organizing and presenting them under meaningful and easy-to-understand categories, and engaging users for their selections. QREK will be formally tested using PCOR/CER use cases co-developed with advisory panel members, such as queries about obesity, neurorehabilitation, and appendicitis.