|POSTER PRESENTATION SCHEDULE (Listed Alphabetically By Presenter)|
|Bahr, Nathan||Oregon Health & Science University||#115 – Teamwork Behaviors of Emergency Medical Service Teams in Pediatric Simulations|
|Bennett, Paul||University of Wisconsin-Madison||#106 – Improving and Applying Medical High-Throughput Machine Learning|
|Bernstein, Matthew||University of Wisconsin-Madison||#307 – Standardizing Sample-Specific Metadata in the Sequence Read Archive|
|Bian, Jiantao||University of Utah||#110 – Automatic Identification of High Impact Articles in PubMed to Support Clinical Decision-Making|
|Brar, Rajdeep||Yale University||#105 – A Multi-Axial Based Knowledge Management System for Alerts|
|Chaparro, Juan||University of California, San Diego||#112 – Prospective Study of a Kawasaki Disease Natural Language Processing Tool|
|Cheng, Alex||Vanderbilt University||#108 – Quantifying Burden of Treatment in Patients with Breast Cancer|
|Day, Jeff||National Library of Medicine||#101 – Movement Disorders Journal: Testing an App to Track Parkinson’s Symptoms|
|Goldstein, Andrew||Columbia University||#301 – Informatics Approaches for Evidence Appraisal and Synthesis|
|Hebbring, Scott||University of Wisconsin-Madison||#116 – Large-Scale Family Cohorts Linked to Electronic Health Records|
|Hoffman, Pamela||Veterans Administration||#104 – Designing a Telehealth Training Curriculum using a Telemental Health Model|
|Homer, Mark||Harvard Medical School||#201 – Predicting Accidental Falls in People Aged 65 Years and Older|
|Lee, Donghoon||Yale University||#203 – The Epigenomic Landscape of Aberrant Splicing in Cancer|
|Lin, En-Ju||The Ohio State University||#306 – Understanding Clinical Trial Patient Screening from the Coordinator’s Prospective|
|Lind, Abigail||Vanderbilt University||#205 – Conserved Transcriptional Regulators Control Divergent Toxin Production in Fungi|
|Liu, Yuzhe||University of Pittsburgh||#305 – Impact of Missing Data on Automatic Learning of Clinical Guidelines|
|Lordon, Ross||University of Washington||#107 – Assessing the Delay in Communication Regarding Digital Inpatient Documentation|
|Lu, Songjian||University of Pittsburgh||#208 – Signal-Oriented Pathway Analyses Reveal a Signaling Complex as a Synthetic Lethal Target for p53 Mutations|
|Magnotti, John||Baylor College of Medicine||#308 – Causal Inference During Multisensory Speech Perception|
|McShan, Daniel||University of Colorado-Denver||#209 – Towards a Knowledge-Bases for Biochemical Reasoning|
|Nabavi, Sheida||University of Connecticut||#309 – Data Mining for Identifying Candidate Drivers of Drug Response in Heterogeneous Cancer|
|Nguyen, Khoa||Veterans Administration||#103 – Medication Use Among Veterans Across Health Care Systems|
|Puelz, Charles||Rice University||#113 – Modeling of Hypoplastic Left Heart Syndrome for Improved Decision Support|
|Regan, Kelly||The Ohio State University||#207 – Analysis of Orphan Disease Gene Networks to Enable Drug Repurposing|
|Rule, Adam||University of California, San Diego||#111 – Design Thinking in Radiation Oncology|
|Schau, Geoffrey||Oregon Health & Science University||#206 – Determining Gene Expression Trends using Single-Cell RNA-seq with CREoLE|
|Schneider, Jodi||University of Pittsburgh||#304 – Acquiring and Representing Drug-Drug Interaction Knowledge and Evidence|
|Schuler, Alejandro||Stanford University||#303 – Predicting Heterogeneous Causal Treatment Effects for First-Line Antihypertensives|
|Seco de Herrera, Alba||National Library of Medicine||#202 – Content-Based fMRI Activation Maps Retrieval|
|Slovis, Benjamin||Columbia University||#102 – Design of a Prescription-Based Laboratory Result Notification System|
|Torres, Jessica||Stanford University||#302 – Using Wearable Technology to Aid in the Classification of Different Cardiac Arrhythmias|
|Tran, Le-Thuy||University of Utah||#109 – Evaluating the Use of an Automated Section Identifier for Focused Information Extraction Tasks on a VA Big Data Corpus|
|Varghese, Paul||Harvard Medical School||#114 – Taxonomic Classification of HIT Hazards Associated with EHR Implementation: Initial and Stabilization Phases|
|Wang, Lucy||University of Washington||#204 – Identifying and Resolving Inconsistencies in Biological Pathway Resources|
Poster Abstracts and Author Information
Day 1 – Poster Topic 1 – Healthcare Informatics
Poster #101 Movement Disorders Journal: Testing an App to Track Parkinson’s Symptoms
Authors: Jeff Day, Jeff Baldwin, Omar Ahmad, Mark Hallett, John Harrington, Anne Altemus, and Codrin Lungu, National Library of Medicine
Abstract: Neurologists use patient histories to assess the symptom patterns and severity of Parkinson’s Disease in order to adjust medications. However, patient recall can be imprecise with only two or three yearly visits. We have designed an iPad app to help patients track their symptoms and medications, and we will test compliance in the recording of data between the app and standardized paper forms. Twelve Parkinson’s patients scheduled for the placement of Deep Brain Stimulation (DBS) were recruited for this study, and randomized into two groups: a group of six patients who will receive an iPad, and another group of six patients who will receive paper forms to record their data. Each patient will begin the study after DBS placement and be followed for three months. We will analyze the frequency of patient-recorded data as a test for compliance, and use surveys to evaluate patient satisfaction for both groups. Surveys and patient interviews will provide insight into user experience with the app, which can inform design strategies for mobile technology built for movement disorder patients.
Poster #102 Design of a Subscription-Based Laboratory Result Notification System
Authors: Benjamin H Slovis, Hojjat Salmasian, Gilad Kuperman, David K Vawdrey, Columbia University
Abstract: Background: The delayed review of laboratory results is potentially harmful. Established processes (e.g. phone-calls) provide notification of critical laboratory values, however evidence suggests that physician awareness of non-critical and normal results affect clinical decision-making. Many HIT tools have demonstrated improved physician response time to laboratory results, yet continued utilization and enhancements are rare, with an overall lack of provider control. Specifically, few studies have documented subscription-based notifications. Objective: We propose a tool to provide physicians with near-real-time notification of laboratory results through text-page and email, via subscription at the time of order-entry. Needs-assessments will include evaluation of the extent to which current processes delay hospital care, resulting in clinician dissatisfaction. Preferred methods of notification and notification utility for specific laboratory tests will also be assessed. Methods: A physician-observer will document current processes, and promote dialog with clinical house-staff at an urban academic hospital regarding barriers to appropriate results-review. A survey will be distributed to determine the perceived usefulness of subscribed laboratory notifications. Significance: Our long-term objective is to develop a subscription-based notification system to reduce time between available results and physician awareness. We expect physician interest and encouraging survey results. Such a tool has the capacity to potentially improve the quality of clinical care.
Poster #103 Medication Use Among Veterans Across Health Care Systems
Authors: Khoa A Nguyen, Alan J Zillich, Susan Perkins, David Haggstrom,
Dept. of Veterans Affairs Richard L Roudebush VA Medical Center, Indianapolis, IN; Purdue University College of Pharmacy
Abstract: Dual health care system use is becoming a common type of care for most Veterans. The VA is implementing a nationwide health information exchange (HIE) program called the Virtual Lifetime Electronic Record (VLER), which allows providers to access and share patient information among each other. Because there is a lack of information about the use of medications across dual systems of care, the objective of this study is to describe the prevalence of medication dispensing across VA and non-VA health care systems prior to enrollment in VLER.
In this retrospective cohort study, we examined outpatient dispensing during a two-year window prior to VLER enrollment. Data were extracted from the VA Pharmacy Benefits Management system and a regional HIE. Medication source was assessed at the subject level, and categorized as VA source, non-VA source, or both. We then compared the mean number of prescriptions as well as overall and pairwise differences in medication dispensing.
Out of 52,444 Veterans included in our study, 17.4% of subjects (n=9,123) obtained medications outside the VA including prescriptions for antibiotics, antineoplastics, and anticoagulants. Subjects receiving medication from both sources appeared to have more complex medical needs, as reflected by their higher overall mean number of medications.
Poster #104 Designing a Telehealth Training Curriculum using a Telemental Health Model
Authors: Pamela Hoffman, Rhonda Johnston, Cindy Brandt, and Linda Godleski, Department of Veterans Affairs, VA Connecticut Healthcare System; VA Telehealth Service
Abstract: Problem: Telehealth is a well-established modality for treating patients at a distance and improving access to care. Few studies have been published on training in telehealth specialties. Approach: We propose a standard curriculum on telehealth, based on a current telemental health training model. Our innovative curriculum follows a strategic outline: Background, evidence base, legal and regulatory concerns, emergency procedures, applications for an encounter, and case simulations. Outcomes: This model curriculum has been implemented, live and remotely to over 4800 participants in 2 VA facilities and 3 training programs, with very positive effect. Participant satisfaction is consistently over 80% and learners’ impressions of competence invariably increase. Future steps: This innovative model is a way to standardize training efforts in telehealth. Virtual and remote training in telehealth will extend access to knowledge and subsequent services to patients nationwide.
Poster #105 A Multi-Axial Based Knowledge Management System for Alerts
Authors: Rajdeep Brar1, Richard Shiffman1
1 Yale Center for Medical Informatics, New Haven, CT
Abstract: Background: The phenomenon of alert fatigue can have serious negative implications in regard to workflow, user satisfaction, clinical effectiveness, as well as patient safety. Knowledge organization models that can categorize clinical alerts in a comprehensive and useful way for curation and update are needed. Hypothesis: A multi-axial based knowledge organization model for alerts can help target areas for quality improvement and patient safety. Methods: The 546 alerts in Yale’s instance of Epic™ will be manually categorized according to function, IOM quality heading, medical specialty, care setting, and additional groupings with perceived utility. Alert firing and override statistics will be monitored. Results: A comprehensive set of alert categories has been identified. Additional categories will be added to the initial set for model enrichment. We plan to improve alert use and override statistics by targeting poorly performing alerts based on category. Conclusions: We believe this approach will be useful when maintaining existing clinical alerts and when building new ones. Statistics will be computed on each category, e.g., frequency of firing and action by user, and then used to garner insights into whether certain categories of alerts are performing as expected. We will then use those insights to target alerts for sensitivity/specificity adjustment or retirement.
Poster #106 Improving and Applying Medical High-Throughput Machine Learning
Authors: Paul Bennett1,2,†, Ross Kleiman1,2,†, Peggy Peissig3, Zhaobin Kuang1, James Linneman3, Scott Hebbring3, Michael Caldwell3, David Page1,2
1Department of Computer Sciences, University of Wisconsin, Madison, 2Computation and Informatics in Biology and Medicine, 3Marshfield Clinic, Marshfield, WI † Co-First Author
Abstract: In recent years, many healthcare professionals and researchers have become keenly interested in predicting disease risk using electronic medical record data. Using highly parallelized computing, we built predictive models for nearly every diagnosis (ICD-9 code) a patient could receive. These models achieved a mean AUC of 0.8026±0.0619 predicting diagnoses 1 month in advance and a mean AUC of 0.7585±0.0631 predicting diagnoses 6 months in advance. Given the tremendous breadth of this work, we are presented with many new challenges. Our research helps address the difficult task of appropriately matching cases to controls across thousands of diagnoses with particular emphasis on case-control matching for pregnancy complication prediction. Furthermore, we examine novel applications unique to high-throughput prediction. We perform a simulated prospective study across all diagnoses predicted and then bi-cluster patients and diseases based on the model scores. We also investigate using model scores as a feature set for predicting hospital readmission. Our research represents a new direction in medical machine learning and completes several necessary steps in improving and applying this high-throughput method of diagnosis prediction.
Poster #107 Assessing the Delay in Communication Regarding Digital Inpatient Documentation
Authors: Ross Lordon, Thomas Payne, University of Washington
Abstract: Within the past decade, healthcare records generally have transitioned from paper to digital formats. Unfortunately, this new method is time consuming1. A study in 2012 reported physicians were spending 49% of their workday using a computer and 70% of this time was spent performing documentation2. An unintended consequence concerns the delay between when patients are seen during rounds and when their encounter note is written and signed by their physician. The encounter note is the central location of critical care information. Within certain popular EHRs, an encounter note is not viewable by others until it is signed. This delay may cause communication errors, delay in care, or other unintended consequences.
We conducted a prospective observational study of physician teams within a county safety net hospital. Physicians recorded the time each patient was seen during rounds. Timestamps documenting when notes were signed in the EHR were obtained from a clinical data repository. The gap in documentation was calculated by determining the difference between these times. 212 patient encounters were analyzed and the average documentation gap was 5.4 hours with a maximum of 17.3 hours. An opportunity exists to improve the digital documentation process, potentially allowing physicians to be more efficient.
- Cusack CM, Hripcsak G, Bloomrosen M, Rosenbloom ST, Weaver CA, Wright A, Vawdrey DK, Walker J, Mamykina L. The future state of clinical data capture and documentation: a report from AMIA’s 2011 Policy Meeting. J Am Med Inform Assoc. 2013 Jan 1;20(1):134- 40. doi: 10.1136/amiajnl-2012-001093. Epub 2012 Sep 8. PubMed PMID: 22962195; PubMed Central PMCID:
- Oxentenko AS, Manohar CU, McCoy CP, Bighorse WK, McDonald FS, Kolars JC, Levine JA. Internal medicine residents’ computer use in the inpatient setting. J Grad Med Educ. 2012 Dec;4(4):529 32. doi:10.4300/JGME-D-12-00026.1.
Poster #108 Quantifying Burden of Treatment in Patients with Breast Cancer
Authors: Alex C Cheng, Mia A Levy, Vanderbilt University
Abstract: Chronic disease decreases a patient’s quality of life through the direct effect of illness, as well as the burden of treatment imposed to counteract illness. While burden of illness is well studied, the burden of treatment is not as well understood or monitored. We developed a method to quantify one dimension of the burden of treatment based on patient encounters with the healthcare system. Specifically, we tracked the total time spent in appointments and admissions, waiting time, and travel time to the medical center. We applied this method to a population of stage I-III breast cancer patients at Vanderbilt University Medical Center.
We were able to differentiate burden of treatment for patients with stage I-III cancer in the first 18 months after diagnosis. As hypothesized, stage III patients had the greatest treatment burden, followed by stage II patients and stage I patients. Future work will evaluate the reproducibility and generalizability of this method for quantifying burden of treatment across other clinical settings and chronic diseases. This approach may enable identification of high-risk groups that could benefit from interventions to decrease patient work and improve outcomes.
Poster #109 Evaluating the Use of an Automated Section Identifier for Focused Information Extraction Tasks on a VA Big Data Corpus
Authors: Le-Thuy T Tran, Guy Divita, Marjorie H Carter, Matthew H Samore, Adi V Gundlapalli, University of Utah School of Medicine and VA Salt Lake City Health Care System
Abstract: The Veterans Health Information Systems and Technology Architecture (VistA)/CPRS (Computerized Patient Record System) is an electronic medical record of the VA enterprise-wide health information system. The large numbers of clinical notes stored in VistA/CPRS are a valuable information extraction resource for detecting patient care and treatment patterns, risks and outcomes of diseases, or adverse events. For efficiently mining these data, we have developed an automated section identifier based on an ontology of clinical document sections to preprocess the clinical notes for further focused information extraction. The identifier was first trained on a set of 1000 documents and then used to identify a fine level of clinical note sections in a corpus of about one million records derived from VistA. The information from this preprocessing step is stored for future efficient access to a specific content of the notes.
We evaluate the use of our developed automated section identifier for focused information extraction tasks including extracting vital signs data, retrieving patient-reported symptoms, and identifying risk and evidence of homelessness among Veterans.
Poster #110 Automatic Identification of High Impact Articles in PubMed to Support Clinical Decision-Making
Authors: Jiantao Bian1, Siddhartha Jonnalagadda2, Gang Luo1, Guilherme Del Fiol1
1University of Utah, 2Northwestern University
Abstract: Objectives: Researchers have been trying to make PubMed more useful for supporting clinicians’ decision making. We aim to help clinicians find studies with high clinical impact. Materials and Methods: Our overall method is based on machine learning algorithms with a variety of features including Altmetric score (tracks online popularity of scientific work), journal impact factors, study registration in ClinicalTrials.gov, publication in PubMed Central, article age, study sample size, comparative study, citation count, number of comments on PubMed and study quality (according to a state-of-the-art machine learning classifier developed by Kilicoglu et al.). The algorithms were developed and evaluated with a gold standard composed of 502 high impact clinical studies that are referenced in 11 clinical guidelines from various diseases. Results: Among Naïve Bayes, support vector machine (SMO), and decision tree (J48) with default parameters in Weka, Naïve Bayes performed best. It outperformed the baseline in terms of top 20 precision (mean =34% vs. 12%), mean average precision (mean = 24% vs. 5%) and mean reciprocal rank (mean = 0.78 vs. 0.18). Conclusions: Preliminary results show that the high impact Naïve Bayes classifier using a variety of features is a promising approach to identifying high impact studies for clinical decision support.
Poster #111 Design Thinking in Radiation Oncology
Authors: Adam Rule, Erin Gillespie, Nadir Weibel, Todd Pawlicki, University of California, San Diego
Abstract: Radiation oncologists routinely use weekly chart rounds to check quality of care with other clinicians. However, there is sparse evidence that chart rounds improve patient outcomes. Moreover, recent studies found just 4-12% of treatment plans were modified at typical chart rounds. This low rate has been attributed to limited time for discussing patient cases (just 3 minutes at many practices) and many cases being review after treatment begins.
To redesign chart rounds, we assembled a team of radiation oncologists, physicists, and designers at UC San Diego for two half-day workshops. The participants used design thinking to guide the workshops, which encourages thoroughly defining the problem before brainstorming solutions.
In the first workshop, participants identified four goals of chart rounds (quality assurance, decision support, education, and team building) and identified three areas for redesign. (How might we document and disseminate informal peer review? How might we ensure participants feel time spent on peer review is well spent? How might we facilitate a culture of collaboration, safety, and team building?) During the second workshop, participants brainstormed solutions to these prompts including an email review system that supports more focused and flexible forms of review. This design is currently being prototyped for testing.
Poster #112 Prospective Study of a Kawasaki Disease Natural Language Processing Tool
Authors: Juan D Chaparro, Chu-Nan Hsu, Zach Meyers, Adriana Tremoulet. University of California, San Diego
Abstract: Kawasaki Disease (KD) is a rare pediatric febrile syndrome consisting of prolonged fever and five clinical symptoms. Nearly 20% of children with KD develop coronary artery aneurysms if left untreated. However, diagnosis is often delayed due to lack of a diagnostic test and overlap with other febrile syndromes, thus there is a need for improved diagnostic tools.
KD-NLP is a natural language processing tool to identify patients with high-suspicion for KD using provider notes from the Emergency Department (ED). We recently published the development and testing of this tool using retrospective ED notes from patients with KD and febrile patients. The tool identifies the presence/absence of the five signs of KD in the narrative text and classifies patients on these findings.
We will implement this tool into a live electronic health record system to 1) prospectively determine the sensitivity/specificity of KD-NLP in a low prevalence population and 2) to evaluate the feasibility of KD- NLP in providing clinical decision support in a time frame that can affect medical decision making.
We are integrating the KD-NLP tool into the Epic ASAP module at Rady Children’s Hospital San Diego and will begin data collection, but are also considering integration in non-pediatric emergency departments.
Poster #113 Modeling of Hypoplastic Left Heart Syndrome for Improved Decision Support
Authors: Charles Puelz1, Beatrice Rivière1, Craig G Rusin2
1 Rice University, Houston, TX, 2 Baylor College of Medicine and Texas Children’s Hospital, Houston, TX
Abstract: Babies born with congenital heart defects often require immediate surgery and many hours of critical care in the hospital. Their hemodynamic state pre- and post-surgery is complex, abnormal, and extremely challenging to manage. Indeed, all vital signs may indicate stability and yet the patient falls into unexpected cardiac arrest. Currently, our research focuses on a class of defects generally identified by a severely underdeveloped left ventricle called Hypoplastic Left Heart Syndrome (HLHS).
The purpose of our work is to develop a clinical decision support tool, based on a computational fluid dynamics model of the entire circulatory system, to aid clinicians in providing critical care to HLHS patients. This tool predicts blood pressure and flow waveforms in peripheral arteries and veins, and allows for the incorporation of measured patient data for simulations and model validation. Our goal is for clinicians to use this tool for insight into the complex hemodynamics of HLHS, and in turn to improve the care provided to these patients at the bedside.
This research was funded by a training fellowship from the Gulf Coast Consortia, on the Training Program in Biomedical Informatics, National Library of Medicine (NLM) T15LM007093, PD – Lydia E. Kavraki.
Poster #114 Taxonomic Classification of HIT Hazards Associated with EHR Implementation: Initial and Stabilization Phases
Authors: Paul Varghese, Adam Wright, David Bates, Harvard Medical School
Abstract: Data that describe the nature, magnitude and frequency of these EHR safety concerns remain scarce, with a limited number of studies focused upon mining patient safety incident reporting databases. By using both traditional in-hospital patient safety monitoring system reports and previously unexamined hospital information services customer complaint reports during a large-scale implementation of EHR at an academic medical center, we are in the process of 1) categorize the types of hazards using AHRQ hazard criteria; 2) assessing type and severity of patient harm (actual and potential) in both the initial phase (3 months) and subsequent stabilization phase.
Poster #115 Teamwork Behaviors of Emergency Medical Service Teams in Pediatric Simulations
Authors: Nathan Bahr, Jeanne-Marie Guise, Paul N Gorman, Oregon Health and Science University
Abstract: Teamwork can determine patient outcomes during prehospital care. In this work, we describe behaviors that appear to distinguish high-performing teams from low-performing teams and may contribute to improved outcomes.
Forty Emergency Medical service teams were recruited to participate in 4 pediatric simulations. Simulation performance and outcomes were assessed independently by a domain expert by counting and classifying observed errors and using the Clinical Teamwork Scale (CTS). Teams were classified as high-performing and low performing based on this assessment and selected two for analysis. To identify behaviors, the simulations were recorded, transcribed, and coded according to team communication patterns (speaker-listener interactions), task focus (task relevance of dialog content), and verbal behaviors (apparent purpose of speech act, e.g. query, inform, direction, acknowledge, etc.).
In the high-performing team, the leader called the Person in Charge (PIC), provided other members with situational assessments, clear goals, and directions to reach those goals. In the low-performing team, the PIC exhibited a preference to summarizing the situation and stating their own actions over directing others. We hypothesize that this behavior may be a silent cry for help, in which the PIC becomes lost and needs support from their teammates.
Poster #116 Large-Scale Family Cohorts Linked to Electronic Health Records
Authors: Scott J Hebbring1, 2, Xiayuan Huang2, John Mayer1, Zhan Ye1, David Page2, (1) Marshfield Clinic and (2) University of Wisconsin Madison
Abstract: Challenges in population-based genetic research have resulted in a re-awakening of family-based studies. However, significant difficulties arise when identifying the most interesting diseases and families for genetic research. Use of large patient populations linked to an electronic health record (EHR) may alleviate such challenges. Using readily available basic demographic data in an EHR, we identified over 173,368 families including 8,242 families of twins from Marshfield Clinic. With these large cohorts of families all linked to extensive health records, thousands of diseases may be studied simultaneously by phenome-wide approaches. Studies in twins suggest that few diseases are random events and that family relationships are extremely important in predicting disease risk. With our novel phenome-wide methodologies highly translatable to other EHR systems, this study may pave the way for biotechnologically smart EHR systems that integrate family data to generate personalized family histories in real-time for the prediction, prevention, and treatment of many diseases and advancement of “precision medicine.” Lastly, this study provides an intriguing perspective for the future of genetic epidemiologic research. Specifically, the future when large patient populations with sequenced genomes are unified by familial relationships in an integrated EHR system.
Day 1 – Poster Topic 2 – Bioinformatics/Computational Biology
Poster #201: Predicting Accidental Falls in People Aged 65 Years and Older
Authors: Mark L Homer1,2, Nathan P Palmer1,2, Kenneth D Mandl1,2
1Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, 2Department of Biomedical Informatics, Harvard Medical School, Boston, MA
Abstract: More than half a million people over 65 years of age accidentally fall every year in the United States alone. To help tackle the problem, we develop a predictive analytics model based upon machine learning (logistic regression with LASSO) to estimate each individual’s unique risk of falling by looking at their past insurance claims. During testing, our predictive model successfully risk stratified people, where those in the highest stratum had greater than 15 times the risk than those in the lowest stratum (34.7% vs. 1.7%). Next steps include better modeling techniques and running a prospective study.
Poster #202: Content-Based fMRI Activation Maps Retrieval
Authors: Alba G Seco de Herrera, L Rodney Long, Sameer Antani, National Library of Medicine
Abstract: Functional Magnetic Resonance Imaging (fMRI) is a powerful tool used in the study of brain function. It can non-invasively detect signal changes of cerebral blood flow in areas of the brain where neuronal activity is varying. Statistical analysis of fMRI data is used to locate brain activity and generate brain activation maps. These maps are used to determine how a task is correlated with particular perceptual or cognitive state that is encoded by active brain regions.
Neuroimaging data sharing is becoming increasingly common. Currently, some efforts have been made to develop fMRI repositories. However, there is a need for content-based (CB-) fMRI retrieval methods that can retrieve studies relevant to a “query” brain activation. One approach is to take into account the full spatial pattern of brain activity to retrieve similar activity maps. This approach could also be extended to support cognitive state-based retrieval.
This work present an approach for CB-fMRI activations maps retrieval which return activation maps that have similar activation patterns to the given one. The proposed method develops a similarity score that matches map activation maps.
Poster #203: The Epigenomic Landscape of Aberrant Splicing in Cancer
Authors: Donghoon Lee, Jing Zhang, Mark B Gerstein, Yale University
Abstract: Nearly all protein-coding genes undergo alternative RNA splicing, which provides an important mean to expand transcriptome diversity beyond the scope of genomic information. While splicing is an elaborate process, it can be prone to errors that could become pathogenic. Unsurprisingly, aberrant splicing, which collectively refers to splicing events that could confer risk of a disease, is often implicated in cancer.
Recent studies have revealed splicing regulation is characterized by increased levels of nucleosome density and positioning, DNA methylation, and distinct histone modification patterns. However, most studies on aberrant splicing have largely focused on identifying genomic- and transcriptomic-level variations within splice sites, cis-acting splicing regulatory elements, and trans-acting splicing factors. The extent, nature, and effects of epignomic dysregulation in aberrant splicing remain unsolved.
By systematically profiling the epigenomic landscape of aberrant splicing using transcriptomic and epigenomic data from the ENCODE and the Epigenome Roadmap projects, we aimed to (1) identify chromatin status and distinct epigenetic signatures that characterize aberrant splicing in cancer, (2) classify aberrant splicing by different class of epigenomic dysregulation, and (3) elucidate the role of epigenomic control in aberrant splicing. The proposed study will significantly advance our understanding of epigenomic contribution to aberrant splicing in cancer.
Poster #204: Identifying and Resolving Inconsistencies in Biological Pathway Resources
Authors: Lucy L Wang, John Gennari, Neil Abernethy, University of Washington
Abstract: Biological pathways provide a high-level view of biological and disease processes, and have become a popular tool for studying genetic and molecular interactions. Many pathway knowledge bases exist providing complementary information; there have been attempts to integrate these resources to improve our analysis and understanding of biology. However, the same biological processes are represented differently in different resources, as each resource makes its own choices in knowledge representation. There is currently no accepted standardized way to integrate such data. A method is needed to access the collective knowledge of all these different data sources.
In order to merge information across pathway knowledge bases, inconsistencies must be identified and understood. Inconsistencies are found in 1) entity annotation, 2) entity existence, 3) reaction semantics, 4) reaction and entity granularity, 5) asserted level of information, and 6) external references. We identified these types of inconsistencies in several human pathway resources: HumanCyc, KEGG, PANTHER, and Reactome. We also provide recommendations for aligning pathways between resources, thereby providing biologists new ways to use and interpret the existing knowledge. This in turn is essential for furthering our understanding of biology and pathology, paving the way to advances in pathway analysis and drug target identification.
Poster #205: Conserved Transcriptional Regulators Control Divergent Toxin Production in Fungi
Authors: Abigail L Lind, Timothy D Smith, Ana M Calvo, and Antonis Rokas, Vanderbilt University and Northern Illinois University
Abstract: Filamentous fungi produce diverse secondary metabolites (SMs) essential to their ecology and adaptation. Fungal SMs have a double-edged impact on humans; some are carcinogenic toxins found in contaminated food supplies, while others, such as lovastatin and penicillin, have been repurposed as successful therapeutics. SMs play crucial roles in fungal ecology; lovastatin and penicillin, for example, are both antimicrobial compounds that provide their producers with a competitive advantage. In fungi, SMs are extremely diverse; each SM is typically produced by only a handful of species. The production of SMs is triggered by both biotic and abiotic factors and is controlled by widely conserved transcriptional regulators. To understand how the transcriptional regulators of SM regulate such divergent pathways under different conditions, we examined the genome-wide regulatory role of several master SM regulators in different fungal species and in different environmental conditions. Our findings indicate that master SM regulators undergo rapid transcriptional rewiring and interact with multiple abiotic signals to control SM production.
Poster #206: Determining Gene Expression Trends using Single-Cell RNA-seq with CREoLE
Authors: Geoffrey F Schau, Andrew Adey, Oregon Health and Science University
Abstract: Single-cell RNA-sequencing (scRNA-seq) is widely used to recapitulate gene expression trends through developmental time of heterogeneous biological tissue. Although several methods have sought to estimate pseudo-temporal expression trends, a number of technical limitations presented by scRNA-seq remain, including high expression variability and drop-out measurements, complicating trend estimation. We hypothesize that consensus estimation made by iteratively sub sampling expression profiles of individual cells will yield a smoother, more biologically accurate expression trend less susceptible to technical noise. To address this need, we have developed CREoLE, Consensus Representative Estimation of Lineage Expression, a general purpose algorithm designed to appropriately scale the dimensionality of scRNA-seq data, establish a branching lineage pathway substructure, and produce smooth, high-resolution gene expression trends through each developmental lineage.
Our analysis includes a comparison of current methods to CREoLE on both simulated as well as publicly available scRNA-seq data. In the simulation studies, we examined the impact of varying levels of artificial noise and drop out measurements. In these cases, CREoLE returns similar estimations at all evaluated noise levels and recapitulates published expression trends from literature, supporting our hypothesis that trend smoothing is feasible by calculating consensus estimation. CREoLE is implemented in R and is publicly available on GitHub.
Poster #207: Analysis of Orphan Disease Gene Networks to Enable Drug Repurposing
Authors: Kelly Regan, Zachary Abrams, Philip R O Payne, Department of Biomedical Informatics, The Ohio State University
Abstract: Over 7,000 orphan diseases have been described, while treatments exist for fewer than 400 due to their limited prevalence, lack of research resources and reduced commercial potential. Thus, drug repurposing represents an ideal alternative in order to circumvent the high costs and inefficiencies of the current drug discovery pipeline. Previous research has shown that disparate orphan diseases are highly connected through genetic mechanisms. Connectivity mapping is a computational drug repurposing system that exploits the observation that changes in gene expression patterns can reflect different conditions in human cells, such as exposure to drugs, gene-modifying agents and disease processes. We obtained orphan disease-gene relationship data from the Orphan Disease Network and Orphanet databases. Functional implications (e.g. GOF/LOF status) of orphan disease gene mutations were confirmed using the OMIM database. We focused on disease-causing germline mutation genes corresponding to reduced gene protein product and/or function in order to align with LINCS gene knock-down perturbation experiments. This study represents the first systematic application of gene expression-based connectivity mapping of orphan diseases for drug repurposing and to recapitulate known disease-disease relationships. Using network community detection algorithms, we have identified novel drug candidates for a subset of highly connected orphan disease network modules.
Poster #208: Signal-Oriented Pathway Analyses Reveal a Signaling Complex as a Synthetic Lethal Target for p53 Mutations
Authors: Songjian Lu, Chunhui Cai, Gonghong Yan, Zhuan Zhou, Yong Wan, Lujia Chen, Vicky Chen, Gregory F Cooper, Lina M. Obeid, Yusuf A Hannun, Adrian V Lee and Xinghua Lu, University of Pittsburgh
Abstract: The multi-omics data from The Cancer Genome Atlas (TCGA) provide an unprecedented opportunity to investigate cancer pathways and therapeutic targets through computational analyses. In this study, we developed a signal-oriented computational framework for cancer pathway discovery. First, we identify transcriptomic modules that are abnormally expressed in multiple tumors, such that genes in a module are most likely regulated by a common aberrant signal. Then, for each transcriptomic module, we search for a set of somatic genome alterations (SGAs) that perturbs the signal regulating the transcriptomic module. Computational evaluations indicate that our methods can identify pathways perturbed by SGAs. In particular, our analyses revealed that SGAs affecting TP53, PTK2, YWHAZ, and MED1 perturb a set of signals that promote cell proliferation, anchor-free colony formation, and epithelial-mesenchymal transition (EMT). We further demonstrate that these proteins form a signaling complex that mediates these oncogenic processes in a coordinated fashion. These findings lead the hypothesis that disrupting the complex could be a novel therapeutic strategy for treating tumors with these genomic alterations. Finally, we show that disrupting the signaling complex by knocking down PTK2, YWHAZ, or MED1 attenuates and reverses oncogenic phenotypes caused by mutant p53 in a “synthetic lethal” fashion. This signal-oriented framework for searching pathways and therapeutic targets is applicable to all cancer types, and thus potentially could have a broad impact on precision medicine in cancer.
Poster #209: Towards a Knowledge-Base for Biochemical Reasoning
Authors: McShan, Daniel and Hunter, L, University of Colorado-Denver
Abstract: KaBOB is knowledge-integration framework focused on genes and proteins, intended to support mechanistic explanations of experimental results in genomics, transcriptomics and proteomics. Extending it to include metabolic information would facilitate analysis of metabolomic datasets as well. Potential metabolomic knowledge sources for integration include HumanCyc with 1826 metabolites, ChEB with 3947 “human metabolites”, and the Human metabolome database (HMDB) wth 29289 “endogenous” human metabolites.
HMDB has an order of magnitude more metabolites than HumanCyc or ChEBI largely because it curates not only small molecules but lipids, which are important in metabolism and signalling. HMDB provides cross references to HumanCyc (1174) and ChEBI (2791). Of these, only 1064 are cross-referenced to both; 1767 are in ChEBI, not HumanCyc, and 235 are in HumanCyc, not ChEBI. However, HMDB is not a superset of these other two data sources. Compared to what they self report, 36% (652/1826) metabolites are in HumanCyc but not in HMDB, and 29% (1156/3947) are in ChEBI but not HMDB.
In order to create a comprehensive knowledge-base of metabolites, each of these sources must be integrated. To do so, the KaBOB framework requires that each knowledge source be converted into a formal semantic relationship grounded in Open Biomedical Ontologies and expressed in the Semantic Web standard OWL language. Future work involves semantic mappings for each of the sources, and a set of queries demonstrated the ability to access knowledge seamlessly from all of them simultaneously.
Day 1 – Poster Topic 3 – Clinical Research Translational Informatics
Poster #301: Informatics Approaches for Evidence Appraisal and Synthesis
Authors: Andrew D Goldstein, Eric Venker, Chunhua Weng, Columbia University
Abstract: Clinical evidence should be valid, applicable, and synthesized. Unfortunately, bias, error, misconduct, and underreporting harm validity. Applicability is often inadequately defined and validated. Synthesis can be sporadic, redundant, or lacking rigor, completeness, or timeliness. Underlying these issues is the volume, disorganization, and under-appraisal of evidence. We surveyed the informatics literature addressing these issues, and defined knowledge gaps and intervention opportunities.
We first conducted a scoping review of articles focused on evidence appraisal and synthesis in 8 biomedical informatics journals. The search yielded 838 citations; 53 were included, representing 0.2% of all 24813 citations. Interventions included classifiers (60%), ontologies (17%), and social computing (9%). For classifiers, articles were predominantly validation studies, not broad implementations. For ontologies and social computing, articles were predominantly perspective pieces. Generally, appraisal tools had descriptive, not critical functions, and synthesis tools were aimed at search and inclusion, not subsequent synthesis processes.
Next, we are conducting a scoping review of articles focused on evidence appraisal in the broader biomedical literature to develop a conceptual framework, identify barriers, and propose informatics solutions. Initial analysis demonstrates that appraisal is not systematic, formal, or integrated into the scientific corpus and that existing attempts at solving this are problematic.
Poster #302: Using Wearable Technology to Aid in the Classification of Different Cardiac Arrhythmias
Authors: Jessica N Torres, Euan Ashley, Stanford University
Abstract: Cardiovascular diseases such as Atrial fibrillation (AF) and hypertrophic cardiomyopathy (HCM) increase the risk of stroke, heart failure, and even sudden death. The largest obstacle to early AF and HCM detection is its tendency to be intermittent and asymptomatic. Current clinical practices fails to capture latent risk situations such as changes in magnitude or variability over time or under specific conditions. Wearable technology affords the opportunity to continuously monitor patients through wireless medical sensors or mobile biosensors. This massive amount of real-time biometric data may hold invaluable clues for improving human health. In our study, we use a Samsung Simband device, a health-focused wearable technology, to monitor patient’s physiological characteristics. Here, we present methods to process optical high-intensity LEDs technology known as photoplethysmography (PPG) signal for 1) estimating heart rate in the high intensity motion and 2) AF and HCM arrhythmia detection and classification. We find that knowledge gained from this application can lead to a better understanding of how new wearable technologies can be used to classify abnormal cardiac arrhythmias.
Poster #303: Predicting Heterogenous Causal Treatment Effects for First-Line Antihypertensives
Authors: Alejandro Schuler, Nigam Shah, Stanford University
Abstract: Hypertension (high blood pressure) is an overwhelmingly prevalent risk factor for negative cardiovascular outcomes, including heart disease and stroke. Despite being treatable, many patients struggle to control their hypertension. This is partly because there is considerable heterogeneity in patient responses to different classes of antihypertensive drugs. Although the different classes of hypertensive drugs are equally effective at a population level, it is not currently known which specific patients will respond better to which antihypertensives. We use statistical learning to predict patients’ individual blood pressure responses to antihypertensive treatments using only their medical histories up to the point of their first prescription. To avoid confounding, we employ a sophisticated method of causal inference called a causal forest, which is conceptually a form of data-driven stratified matching. Our analysis is performed on the OHDSI common data model, which will enable us to validate our findings across multiple sites.
Poster #304: Acquiring and Representing Drug-Drug Interaction Knowledge and Evidence
Authors: Jodi Schneider and Richard D Boyce, University of Pittsburgh
Abstract: Potential drug-drug interactions (PDDIs) are a significant source of preventable drug-related harm. Poor quality evidence on PDDIs, combined with prescribers’ general lack of PDDI knowledge, results in thousands of preventable medication errors each year. One contributing factor is that PDDI knowledge lacks a standard computable format. To address this, we are researching efficient strategies for acquiring and representing PDDIs knowledge, focusing on assertions and their supporting evidence.
We are acquiring knowledge from several sources. First, we have transformed 410 assertions and 519 evidence items from prior work. Second, we are examining FDA-approved drug labels, and so far annotators have identified 609 evidence items relating to pharmacokinetic PDDIs from 27 FDA-approved drug labels. Third, annotators have found 230 assertions of drug-drug interactions in 158 non-regulatory documents, including full text research articles.
We are building a two-layer evidence representation, with both generic and domain-specific layers. The generic layer reuses the Micropublications Ontology to annotate assertions and their supporting data, methods, and materials. For the domain-specific component we are building DIDEO–the Drug-drug Interaction and Drug-drug Interaction Evidence Ontology. DIDEO adds specific knowledge, such as the study types required to establish a given type of PDDI. The current version of DIDEO has 385 subclass axioms, and reuses formalized knowledge items, including from the Drug Ontology, Chemical Entities of Biological Interest, the Ontology of Biomedical Investigations, and the Gene Ontology.
Poster #305: Impact of Missing Data on Automatic Learning of Clinical Guidelines
Authors: Yuzhe Liu, Vanathi Gopalakrishnan, University of Pittsburgh
Abstract: Many machine learning algorithms ignore data with missing values. When learning on retrospective clinical data where missing values are common, discarding incomplete entries may significantly reduce the sample size or bias the resulting complete dataset. In our dataset used to learn clinical guidelines for imaging use in pediatric cardiomyopathy, eliminating patients with missing data reduces the dataset size by half. Recent work has shown success using machine learning techniques like decision trees, k-nearest neighbors, and self organizing maps to impute missing data in several real world datasets. We are investigating the impact of various imputation methods on the performance of our Bayesian rule learning technique for discovery of clinical guidelines. We compared the performance of mean value, k-nearest neighbor, and decision tree imputation as well as using indicator variables for missingness against performance on a complete dataset after deleting samples with missing values.
Poster #306: Understanding Clinical Trial Patient Screening from the Coordinator’s Perspective
Authors: En-Ju (Deborah) Lin1, Stephen Johnson2, Albert M Lai1,
1Department of Biomedical Informatics, The Ohio State University; 2Weill Cornell Medical Center
Abstract: Clinical research is crucial for generating evidence and providing effective treatments for patients. However, clinical trials are lengthy and expensive processes that often fail. Slow recruitment has been cited as a primary reason for the failure of clinical trials. Currently, clinical research coordinators typically perform the time consuming process of manually comparing a patient’s, frequently complex, clinical history against a series of eligibility criteria. To address the challenges in recruitment, we plan to develop an automated approach to support pre-screening patients into clinical trials using data from the electronic health records (EHR). We first want to understand how clinical research coordinators identify and pre-screen patients for clinical trials, their needs and their experience with using EHR in the screening process. We conducted semi-structured interviews with 16 clinical trial coordinators at two large academic research medical centers. The interview covered four aspects: screening productivity, the use of EHR, eligibility criteria and language, and attitude towards automation. Using a conventional content analysis approach, two authors (EL and SJ) coded all transcripts and analyzed the concepts arose from the interviews. We have identified current needs and important considerations for moving towards automation.
Poster #307: Standardizing Sample-Specific Metadata in the Sequence Read Archive
Authors: Matthew N Bernstein1 and Colin N Dewey1,2,3
1Department of Computer Sciences; 2Department of Biostatistics and Medical Informatics; 3Center for Predictive Computational Phenotyping, University of Wisconsin, Madison
Abstract: The NCBI’s Sequence Read Archive (SRA) promises great biological insight if one could analyze the data in the aggregate; however, the data remains largely underutilized, in part, due to the unstructured nature of the metadata associated with each sample. The rules governing submissions to the SRA do not dictate a standardized set of terms that should be used to describe the biological samples from which the sequencing data are derived. As a result, the metadata include many synonyms, spelling variants, and references to outside sources of information. For these reasons, it remains difficult to query the database for biological samples that have certain targeted attributes such as specific diseases, tissues, or cell-types. In this poster, I describe our current effort in mapping each biological sample to terms in standardized ontologies. More specifically, we are developing a computational pipeline that automatically associates with each sample in the SRA database a set of terms in the Open Biomedical Ontologies.
Poster #308: Causal Inference During Multisensory Speech Perception
Authors: John Magnotti1, Genevera Allen2, and Michael Beauchamp1
1 Baylor College of Medicine, Houston, TX, 2 Rice University, Houston, TX
Abstract: Speech is the primary form of human communication and is fundamentally multisensory: we seamlessly integrate visual information from a talker’s facial movements and auditory information from the talker’s voice. Integrating information across senses is especially important to counteract ubiquitous hearing loss during normal aging and is clinically relevant for the impaired language abilities observed in autism, schizophrenia, dyslexia, and stroke.
A first step toward eliminating multisensory integration deficits is a computational understanding of multisensory speech perception. Current computational models are based around the assumption that humans automatically integrate all available information from a talker’s voice and face. Daily experiences and laboratory data, however, show that humans are selective in which information they choose to combine, and that this selection varies greatly from person to person. To solve this selection problem, we developed a novel graphical model based on the general idea of causal inference.
We applied our causal inference model to speech perception data from healthy individuals (N=265). Our model outperformed state-of-the-art Bayesian perceptual models, providing a more accurate computational framework for the study of multisensory speech perception. Measuring parameter differences across individuals and clinical groups can give us insight into the underlying reasons for measured differences in face-to-face communication.
This research was funded by a training fellowship from the Gulf Coast Consortia, on the Training Program in Biomedical Informatics, National Library of Medicine (NLM) T15LM007093, PD – Lydia E. Kavraki.
#309: Data Mining for Identifying Candidate Drivers of Drug Response in Heterogeneous Cancer
Author: Sheida Nabavi, University of Connecticut
Abstract: With advances in technologies, huge amounts of multiple types of high-throughput genomics data are available. These data have tremendous potential to identify new and clinically valuable biomarkers to guide the diagnosis, assessment of prognosis, and treatment of complex diseases. Integrating, analyzing, and interpreting big and noisy genomics data to obtain biologically meaningful results, however, remains highly challenging. Mining genomics datasets by utilizing advanced computational methods can help to address these issues.
To facilitate the identification of a short list of biologically meaningful genes as candidate drivers of anti-cancer drug resistance from an enormous amount of heterogeneous data, we employed statistical machine-learning techniques and integrated genomics datasets. We developed a computational method that integrates gene expression, somatic mutation, and copy number aberration data of sensitive and resistant tumors. In this method, an integrative method based on module network analysis is applied to identify potential driver genes. We applied this method to the ovarian cancer data from the cancer genome atlas. The method yields a short list of aberrant genes that also control the expression of their co-regulated genes. The final result contains biologically relevant genes, such as COL11A1, which has been recently reported as a cis-platinum resistant biomarker for ovarian carcinoma.