Towards Resolving Ad-hoc Concept Queries with Table Answers via Multi-source Data Mining:

(Funded by US National Science Foundation, $499K; PI: Huan Sun)

Users often issue queries about certain concepts to gather information and make decisions. The concepts concerned in such queries are usually ad-hoc and less likely to be directly covered in a predefined schema. An ideal response to the query would be a table with entities belonging to the queried concept as the rows and relevant attributes being the columns. However, in most cases, no such tables are readily available, and users have to collect relevant information by themselves, which is a painful process, especially when users are exploring unfamiliar concepts and do not know what information is critical for their decision making. This project builds the first-of-its-kind framework to resolve ad-hoc concept queries with table answers to save users tremendous efforts on information gathering.
The framework to be developed mines multiple complementary data sources including knowledge bases, texts, and tables, and proposes systematic methods to combine relevant knowledge to solve a specific query. This project has the potential to build a practical and transformative question answering system, by focusing on realistic ad-hoc queries rather than simple encyclopedic questions. It can further guide the construction of specialized question answering systems in various domains including medical, social, education, and management. It will open up a series of work on combining multiple sources based on the query need and in an ad-hoc manner for many other tasks. This is critical in the big data age, when many domains like healthcare have emerging complementary data sources such as texts, databases, tables, and human networks. The project will actively participate in outreach education programs, e.g., those to host underrepresented high school students as summer interns.

Advancing Human and Machine Question Answering via Human-Machine Collaboration:

(Funded by Army Research Office, $499K; PI: Huan Sun)

There are two main channels for question answering (QA): machines and humans. Both QA channels have made remarkable advancement recently and are now pervasive in our daily life. However, they still have their respective limitations. In this proposal, we observe that the respective limitations of cutting-edge human and machine QA systems can be largely remedied by the other, and develop novel human-machine collaboration mechanisms to combine human and machine intelligence for question answering.

Unlocking Clinical Text in EMR by Query Refinement Using Both Knowledge Bases and Word Embedding:

(Funded by Patient-Centered Outcomes Research Institute; $1060K, PI: Simon Lin, Huan Sun)

Up to 80 percent of the information in electronic medical records (EMRs) is largely inaccessible because it is contained within clinical narratives. These texts document patient concerns, rationales of the clinical decisions, patient-clinician interactions, and other patient-centered information for real-life healthcare cases. They can be extremely useful for improving the processes of clinical decision making, which benefits both clinicians and patients. They also provide highly relevant data and evidence for research planning for patient-centered outcomes and comparative effectiveness research (PCOR/CER) studies. Just like the power of Google-like search engines, allowing researchers and other PCOR/CER stakeholders to directly interact with EMR texts based on their own interests is invaluable. In particular, the project team will develop a novel framework to generate clinical query refinements and categorize them based on their predicted relationships to the original user query, by employing two techniques, word embedding and knowledge bases, to fill the methodology gap. In addition, the project team will implement the new framework with an interactive interface to create a new and powerful EMR text search engine called QREK (i.e., Query Refinement with word Embedding and Knowledge Bases). Existing EMR text search engines usually automatically extend user queries using knowledge bases in order to enhance search performance. In comparison, QREK can interactively address the vocabulary issue by suggesting related phrases mined from EMR texts, organizing and presenting them under meaningful and easy-to-understand categories, and engaging users for their selections. QREK will be formally tested using PCOR/CER use cases co-developed with advisory panel members, such as queries about obesity, neurorehabilitation, and appendicitis.