Clippers 8/27: Amad Hussain on Synthetic Data for Social Needs Chatbot / Building KGQA for Social Determinants of Health and Sleep Behaviors

Title 1: Synthetic Data for Social Needs Chatbot

Abstract: In many cases social needs resources (e.g. food pantries, financial assistance) got underutilized due to lack of accessibility. While certain websites, such as Findhelp.org, exist to improve accessibility through the aggregation and filtering of resources, a barrier still exists due to disparities in technical literacy and mismatches between patient description of experiences relative to the formal terminology. We week to create a conversational agent which can bridge this accessibility barrier.

Due to patient data privacy concerns, and server-side resource limitations, the patient facing conversational system must be lightweight and not rely on API calls. As such, we make use of knowledge transfer through synthetic conversation generation using LLMs for use in training a downstream model. To reflect different user experiences, we make use of patient profile schemas and categorical expansion.

Title 2: Building KGQA for Social Determinants of Health and Sleep Behaviors

Abstract: Social determinants of health (SDOH) are primarily encoded within free-text clinical notes rather than structured data fields, causing cohort identification to be relatively intractable. Likewise, sleep complaints, while occasionally leading to formal diagnoses, can be missed and solely embedded within free text descriptions. We intend to extract sleep characteristics and SDOH mentions within clinical notes to assist in cohort identification and correlation studies. The goal is to see how certain SDOH factors can relate to sleep concerns, especially in cases where underlying biases can lead to not having a diagnosis despite the presence of appropriate mentions.

While models exist for SDOH extraction, they largely work on public datasets and cannot necessarily be converted to individual hospital system. Likewise, sleep mentions are understudied and do not come with a large-scale dataset. To minimize the need for annotations, we leverage LLMs to extract these mentions using prompt-based, or lightly fine-tuned, methods. To then understand deeper relationships between these two factors, we seek to create a knowledge graph relating SDOH and sleep characteristics for a given cohort, allowing a physician to ask questions of these relations in a downstream KGQA system.