Machine Learning in Mass Spectrometry and Beyond: The Magic Behind the Methods
University of Kansas, Department of Chemistry, Lawrence, KS 66049
This presentation will be a series of parables highlighting key data science concepts that mass spectrometrists should attend to in planning omics studies. The first story will be a fun example of detecting the use of ChatGPT in scientific writing. I will highlight how machine learning is done, and how data science researchers use samples and features to develop predictions about data that can often be astoundingly accurate. I will share the results of our best, unpublished models for detecting ChatGPT in 13 different chemistry journals. Next these same machine learning principles will be applied to important problems that mass spectrometrists think about more often, like identifying effective biomarkers for Alzheimer’s disease. In two different stories, one being proteomics-centric and one being metabolomics-centric, I will show how machine learning can be used in conjunction with high quality omics data sets to infer biological consequences and predict disease. Using data science methods coupled with mass spectrometry to Identify useful disease biomarkers is a formidable challenge, but real successes are possible. Both the samples and the features are important, and carefully attending to the nuances gives the magic to the methods.
Deciphering hierarchical organization of proteins and lipids in the cell membrane
Macromolecular organization between proteins and lipids at the cellular membrane is fundamental to any membrane-associated cellular signaling events. Capturing these associations demands molecular resolution that can unambiguously determine both the identity of large protein complexes, as well as small bound lipids and ligands. Simultaneously, we need nanoscale spatial resolution to capture these assemblies directly from their endogenous membrane of action. Addressing these analytical challenges, we will present our ongoing work in the lab that combines native mass spectrometry with chemical biology, molecular imaging, and other orthogonal tools to render a quantitative molecular view of the protein-lipid organization in the membrane and how that drives downstream cellular siganling.
Assessing membrane fluidity of antibiotic-resistant Staphylococcus aureus using an RPLC-IM-MS method for isomeric phospholipid separations
Kelly Hines, PhD
University of Georgia
Staphylococcus aureus varies its membrane fluidity in response to environmental stresses by changing the ratio of branched-chain fatty acids (BCFAs) to straight-chain fatty acids (SCFAs) in its membrane lipids. Altered membrane fluidity has been associated with an increased tolerance of membrane-targeting antibiotics, including daptomycin. The routine assessment of microbial membrane fluidity relies on the measurement of total BCFA-versus-SCFA determination by GC-MS. Although GC-MS is capable of resolving BCFA and SCFA isomers, the requirement of free fatty acids eliminates the possibility to evaluate the preferences of lipid subclasses for BCFAs vs. SCFAs. We recently demonstrated a RPLC method that can separate lipid isomers having branched-branched, branched-straight, or straight-straight fatty acyl tail combinations. Using this approach and stable isotope labeling, we examined the distribution of FA isomers in the lipids of a S. aureus strain with daptomycin resistance. A strain of S. aureus N315 with high-level daptomycin resistance was found to have substantially more BCFAs in its membrane lipids compared to the isogenic parent N315, which correlated with increased membrane fluidity. Despite the preference for BCFAs is the resistant strain, we found that both organisms could utilize SCFAs when provided in the culture broths. This supplementation reversed the resistant strain’s membrane fluidity towards that of the parent strain. These results indicate that daptomycin resistance can be facilitated in-part by increased membrane fluidity, and support the concept of targeted remodeling of the S. aureus membrane to mediate antibiotic resistance.
Molecular phenomics in systems, synthetic, and chemical biology
John A. McLean, PhD
Vanderbilt University, Department of Chemistry, 7330 Stevenson Center, Nashville, TN 37235 email@example.com
The human genome project is recognized as being one of the most successful big science projects in modern history. One of the primary motivational underpinnings to undertake the HGP was to better understand what made us human and healthy – and how to use this code to improve the human condition by better understanding disease and potential treatment. While the frontiers of our knowledge expanded dramatically, we also uncovered profound biological complexity that we could not understand. This led to the current frontier in the measurement science of molecular phenomics, to catalog the broad-scale changes in the molecular inventory in cells, tissues, and biological fluids at a specific biological state, or in response to exposures and lifestyle choices. In phenomics, we seek to characterize the comprehensive molecular basis of biology (including DNA, RNA, proteins, lipids, carbohydrates, metabolites, and all of their nuances), in both space (e.g. at a cell, tissue, and organismal level) and time (e.g. healthy versus disease state). This places enormous demands on measurement technologies (including minimal sample preparation, fast measurements, high concentration dynamic range, low limits of detection, and high selectivity) and computational approaches to organize the millions of potential species present in vanishingly small spatial coordinates. The interplay between phenomic datasets and bioinformatics forms the nexus of translating phenomics data into actionable information and understanding.
Advances in computational biology rely heavily on the experimental capacity to make omics measurements, i.e. integrated proteomics, metabolomics, lipidomics, glycomics, among many others. Ion mobility-mass spectrometry (IM-MS) provides rapid (ms) gas-phase electrophoretic separations on the basis of molecular structure and is well suited for integration with rapid (us) mass spectrometry detection techniques. This report will describe recent advances in IM-MS integrated omics measurement strategies in the analyses of complex biological samples of interest in systems, synthetic, and chemical biology. New advances in artificial intelligence and machine learning based on developments in internet commerce and astronomy will also be described to approach biological queries from an unbiased and untargeted perspective and to quickly mine these massive datasets. These techniques will be highlighted through selected examples ranging from the creation of microfluidic human-organs-on-chip to replace animal testing in drug development workflows to probing the outcomes of fast genetic editing experiments (using CRISPR) in the optimization of synthetic biology for fine and commodity chemical production. While enormous challenges remain, the promise is immense – comprehensive diagnostics and predictive capabilities for health and medicine of importance to society and beyond.
Capillary electrophoresis-mass spectrometry for proteoforms and protein complexes
Liangliang Sun, PhD
Department of Chemistry, Michigan State University, 578 S Shaw Lane, East Langing, MI 48824; firstname.lastname@example.org
Capillary electrophoresis-mass spectrometry (CE-MS) has been recognized as a promising analytical tool for top-down characterization of proteoforms and protein complexes since 1980s. During the last decade, CE-MS has attracted more and more attention for global denaturing and native top-down proteomics (TDP), aiming to achieve complete pictures of proteoforms and protein complexes in complex biological systems. We recently showed several cases of applying advanced CE-MS techniques to the delineation of proteoforms and protein complexes. First, we performed the first TDP study of a pair of isogenic human nonmetastatic and metastatic colorectal cancer (CRC) cell lines (SW480 and SW620) using CE-MS/MS. [Sci Adv, 2022] We identified 23,622 proteoforms of over 2000 genes from the two cell lines, representing nearly fivefold improvement in the number of proteoform identifications compared to previous TDP datasets of human cancer cells. We revealed substantial transformation of CRC cells in proteoforms after metastasis. Second, we developed a CE-ion mobility spectrometry (IMS)-MS/MS technique for online multi-dimensional separation of proteoforms for the first time and showed that the technique could substantially improve the identification of large proteoforms (>30 kDa) in complex samples. [Anal Chem, 2023] Third, we developed a native capillary isoelectric focusing (ncIEF)-MS technique for high-resolution separation and accurate delineation of protein complexes (i.e., an interchain cysteine-linked antibody-drug conjugate). [Anal Chem, 2022] The ncIEF-MS technique enabled precise measurements of isoelectric points (pIs) of protein complexes, allowing us to study how protein sequence variations/PTMs regulate the pIs of protein complexes.
Idiopathic Pulmonary Fibrosis: Imaging patient derived tissues for lipidomic and proteomic changes
Amanda Hummon, PhD
The Ohio State University
Authors: Emily R. Sekera, Joseph H. Holbrook, Timur Khaliullin, Lorena Rosas, Ana Mora, Mauricio Rojas, Amanda B. Hummon
Idiopathic pulmonary fibrosis (IPF) is an age-associated lung disease with few treatment options. Development of the disease is characterized by alterations in alveolar epithelial cells responsible for surfactant production and potential metabolic dysregulation acts as a key contributor to the pathogenesis of the disease. However, spatially-resolved lipidomics and proteomics data are lacking. Using human lung samples from IPF patients and age-matched donors, we imaged the lipid distribution by Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry Imaging, detecting significant changes. We confirmed that surfactant distribution is altered in IPF tissues. Surfactant components POPG have an overall reduction in IPF samples, while we found a major shift towards the presence of long chain fatty acid CoA species in disease lungs. To compliment the lipidomics data, we have utilized a MALDI-IHC system and have mapped the distribution of several proteins (PanCK, Vimentin, CD68) in the diseased and healthy lungs.
Sponsored Abstracts (Lunch and Learn)
Development of an LC-HRIM-MS workflow and database for untargeted plasma lipidomic analyses (sponsored by MOBILion)
Rachel A. Harris, Emanuel Zlibut, Lauren Royer, Michelle English, Frederick Strathmann
MOBILion Systems, Chadds Ford, PA, USA
Analyzing lipid extracts is complicated by the presence of numerous isomers, which are challenging to fully characterize using traditional Liquid Chromatography-Mass Spectrometry (LC-MS) workflows. The introduction of high-resolution ion mobility (HRIM) techniques, such as Structures for Lossless Ion Manipulations (SLIM), enables rapid, gas-phase separation of lipids with resolving powers over 250, facilitating the identification of biologically relevant lipid isomers uncharacterized in prior analyses. The development of full LC-HRIM-MS workflows for lipidomic analysis should enable deeper characterization of samples via these multidimensional separations. Additionally, calibration of HRIM data allows users to determine the Collision Cross Section (CCS) values for greater identification confidence. In this work, we demonstrate the application of an LC-HRIM-MS method for the creation of a lipid database that enables identification oflipids to a higher degree of structural specificity than lipid headgroup and fatty acyl composition, containing 35 lipid species across multiple lipid classes expected to be present in serum, including PC, PE, TG, SM, and Cer species. Multiple forms of isomerism were included for the species in the database, such as sn regioisomers, cis/trans isomers, and double bond position isomers. This database was then applied to the untargeted analysis of a plasma extract, NIST SRM 1950. Data analysis was performed using the Lipostar 2 software from Molecular Discovery, a vendor neutral, high throughput software package that enables tunable feature finding for both LC-MS and LC-HRIM-MS workflows. Significant attention was paid to the optimization of the feature finding step for the HRIM data. Using this workflow, we were able to detect multiple lipid features corresponding to species included in the database and in some cases were able to detect specific isomers based on their measured CCS.
Untargeted Metabolomics in the Fight for Patient Diagnosis (sponsored by Metabolon)
Heino M. Heyman, PhD
There are over 7,000 known rare diseases, and they impact more people than cancer and AIDS combined. However, given that each rare disease impacts a relatively small number of individuals, these diseases are not widely known to the medical community. It takes an average of 4.8 years for patients suffering from a rare disease to receive a diagnosis. Clinical indications, such as seizures or stunted growth, are often shared indicators of other illnesses, leaving the medical community unable to diagnose rare diseases easily. As a result, there are many “undiagnosed” disease patient populations who are working to find better paths toward diagnosis. While genetic testing is a powerful diagnostic tool, the genes impacted are often of unknown function, or there are too many mutations of unknown penetrance that doctors are again unable to find a diagnosis, much less a path of treatment. In this presentation, the use of untargeted metabolomics as a highly effective diagnostic tool for rare diseases will be presented, along with an inspirational case study of what is possible when rare diseases can be identified.
Rapid and deep proteome analysis enabled by a novel high resolution-accurate mass platform (sponsored by Thermo Fisher Scientific)
Trenton Peters-Clarke, PhD
University of California-San Francisco
The proteome, or collection of proteoforms expressed in a given biological system, is dynamic and heterogeneous. As our appreciation for the complexity of the proteome has evolved, so have the technologies we use to interrogate its composition. The heterogeneity of protein regulation, e.g., splice isoforms and dynamic post-translational modifications (PTMs), expands the observable human proteome far beyond the ~20,000 protein-coding genes in our genome to several million. Capturing the complex milieu of protein machinery and coordinated functions within the cell necessitates instrumentation with speed, sensitivity, and versatility. In this presentation, we describe ultra-fast proteome analysis using a novel HRAM analyzer coupled with a quadrupole-Orbitrap system that is capable of scan speeds approaching 200 Hz with 80,000 resolving power and single-ion detection sensitivity. Using this novel system, we analyzed tryptic digests of whole various proteomes – organisms ranging from bacterial to human – using rapid liquid chromatography or direct infusion DIA methods. We capture ultra-deep human and mouse phosphoproteomes guided by deep libraries, hundreds of Affinity Purification Mass Spectrometry (APMS) protein-protein interaction studies in only a few hours, and one-minute proteomes via direct infusion with the identification and quantification of thousands of proteins with the novel high-resolution, ultra-fast Orbitrap Astral platform. Additionally, we report analysis of the majority of the human proteome in less than one hour of instrument time.
Targeted DESI-MS Imaging Utilizing Nominal Mass Tandem Quadrupole and High-Resolution MS Instruments (sponsored by Waters)
Roy Martin, PhD
MS Imaging provides scientists with the ability to view the chemical diversity in a sample for both discovery and targeted analysis. While a relatively simple technique it requires a level of skill and frequent practice which fits well in a core lab to make it available to users who would benefit from this type of analysis, but do not use it routinely. Waters recently introduced the Select Series MRT system for in-depth molecular MS imaging, providing Ultra-High Time of Flight mass resolution combined with multiple mechanisms for component visualization; MALDI and DESI. As result of these experiments, it came to light that a rapid, sensitive route for routine targeted imaging was desired and the Xevo TQ-Absolute targeted imaging system with DESI was introduced. Both the Select Series MRT and Xevo TQ-Absolute imaging platform will be described and imaging applications on each platform will be provided.
Optimization of a Digital Mass Filter for the Isolation of High m/z Analytes (sponsored by Agilent)
Robert L. Schrader and David H. Russell
Department of Chemistry, Texas A&M University, College Station, TX 77843
Traditionally, the quadrupole mass filter is operated with high-voltage RF and DC waveforms. Sinusoidal RF waveforms at a single frequency are generated using a tuned resonant circuit in the MHz range. Mass range is limited by the maximum RF and DC voltages that can be applied to the quadrupole rod pairs. This is particularly detrimental in the field of native MS where large protein complexes of low charge state (high m/z) are generated. To reduce the amplitude necessary for the selection of high m/z, the drive frequency can be reduced to the 200 – 300 kHz range. Alternatively, digital waveforms vary the duty cycle (percentage of the waveform period at +VRF vs. -VRF) from 50/50 where the quadrupole acts as an ion guide to 61.2/38.8 where the quadrupole acts as a mass filter such that only a small range of m/z values are stable. To change which m/z values are stable, the drive frequency is adjusted at a constant RF voltage. This is particularly advantageous for the selection of high m/z analytes.
A digital mass filter optimized for high m/z has been developed on both a home-built dual quadrupole Orbitrap mass spectrometer and the Agilent 6545 XT Q-TOF. On the Orbitrap system, resolving powers of up to 330 have been demonstrated for the 66+ charge state of GroEL (m/z 12,136). Improvements in ion focusing into and out of the quadrupole mass filter greatly improve isolation efficiency compared to previous designs. On the Agilent 6545 XT, isolation of the 49+ charge state of the single ring mutant of GroEL (m/z 8,000) demonstrates the ability of digital operation to extend the isolation range beyond the standard m/z 4,000.
Structure elucidation of trace level agrochemical metabolites enabled by advances in mass spectrometry (sponsored by SCIEX)
Identification of major xenobiotic metabolites is a requirement for the registration of agrochemicals around the world. This effort often requires trace level identification of metabolites from complex environmental matrices. Decreasing application rates, increasing number of studies and a lack of extraction/purification methods challenge traditional mass spectrometry-based observation and elucidation strategies. To meet these challenges we push the limits of conventional mass spectrometry. This presentation will share some of the approaches we utilize to meet those challenges.
Native Mass Spectrometry on a modified timsTOF Pro (sponsored by Bruker)
Leon (Yu-Fu) Lin
The Ohio State University
Leon (Yu-Fu) Lin, Erin Panczyk, Benjamin Jones, Mark Ridgeway, Arpad Somogyi, Desmond Kaplan, Melvin Park, Vicki Wysocki
Trapped ion mobility spectrometry (TIMS), a high-resolution ion mobility spectrometry technique, separates gas-phase ions based on their size, shape, and mass-to-charge ratio (m/z) using opposing electric and pressure gradients. Here, we demonstrate how modifications to a commercial timsTOF Pro can enable the analysis of high molecular weight (MW) protein complexes for native mass spectrometry (nMS) studies. The novel capabilities afforded by this prototype instrument derive from the combination of improved gas-phase mobility separation of high m/z ions ranging from 50 kDa to 801 kDa, m/z quadrupole isolation up to 16,500 m/z, and the implementation of surface-induced dissociation (SID) to aid in defining protein complex connectivity.
Detecting disease-relevant post-translational modifications in Alzheimer’s disease brain-derived tau filaments using a Bruker timsTOF Pro (sponsored by Bruker)
The Ohio State University
An estimated 55 million individuals are presently living with Alzheimer’s disease (AD), including 6.7 million Americans. AD is a devastating disease partly defined by the extracellular accumulation of neurofibrillary tangles and amyloid beta plaques. These aggregates and associated neuroinflammatory processes ultimately lead to neuronal death, with associated memory loss and cognitive deficits. Available disease-modifying therapies (DMTs) target the major symptoms of AD yet fail to halt disease processes or rescue full cognitive function. To improve existing treatments, it is essential to understand the pathophysiology of AD and related dementias. Towards this aim, we are studying tau post-translational modifications (PTMs), which influence protein function and thus play an important role in disease. We use a Bruker timsTOF Pro mass spectrometer to perform bottom-up proteomics on brain-derived tau 2N4R isoform from 3 post-mortem donors diagnosed with AD. Using data-dependent acquisition in conjunction with parallel accumulation serial fragmentation (PASEF), we demonstrate competitive protein sequence coverage and the identification of novel modification sites.