
Russell Schwartz
· Ray and Stephanie Lane Computational Biology DepartmentVerifiedCarnegie Mellon University · Computer Science
Active 1958–2026
Research topics
- Computer Science
- Artificial Intelligence
- Biology
- Algorithm
- Data Mining
- Engineering ethics
- Medicine
- Bioinformatics
- Computational biology
- Data science
- Engineering
- Medical education
- Genetics
Selected publications
Saturating hepatic clearance drives elevated cfDNA and fragment shortening in cancer
bioRxiv (Cold Spring Harbor Laboratory) · 2026-03-06
articleOpen accessAbstract Liquid biopsy studies consistently report both elevated circulating cell-free DNA (cfDNA) concentrations and shortened fragment lengths in cancer. These features are often attributed to tumor-specific processes, despite tumor-derived cfDNA frequently constituting less than 1% of the total. Here, we consider an alternative explanation: Saturation of cfDNA clearance, which prolongs cfDNA circulation time, increases exposure to plasma nucleases and is expected to produce similar fragmentomic signatures independent of tumor burden. By combining a mechanistic model of cfDNA fragmentation with analyses of two independent cancer patient cohorts, and publicly available clearance-perturbation experiments, we demonstrate that elevated cfDNA levels are accompanied by a characteristic leftward shift in fragment length distributions consistent with impaired hepatic clearance. This fragmentation signature becomes more pronounced at higher cfDNA concentrations, is independent of circulating tumor DNA (ctDNA) fraction, is reproducible under experimentally reduced clearance, and is independently prognostic of patient survival. Together, these results identify saturating clearance as a central determinant of cfDNA abundance and fragment length, re-framing cancer-associated fragmentomic patterns as systemic consequences of clearance dynamics rather than tumor burden alone. More broadly, they highlight the value of mechanistic modeling of clearance processes in extracting clinically meaningful signals from cfDNA fragmentation data.
Scaling and Generalization of Discrete Diffusion Models for Tumor Phylogenies
bioRxiv (Cold Spring Harbor Laboratory) · 2026-03-26
articleOpen accessSenior authorCorrespondingAbstract Tumor phylogenies — rooted trees encoding clonal ancestry and mutation acquisition — are central to understanding cancer evolution, yet generating realistic phylogenies remains challenging. We investigate whether discrete graph diffusion can learn the structural constraints of tumor phylogenies directly from data. Working with approximately 12,500 synthetic phylogenies across twelve evolutionary regimes, we train graph transformer models that denoise typed graphs through a learned reverse diffusion process. Scaling experiments reveal a non-monotonic capacity–performance relationship: a mid-scale model achieves high structural validity and close distributional match to held-out data, while a deeper model fails under fixed optimization hyperparameters. Low-data cross-regime experiments show that diverse training produces more transferable representations than single-regime specialization. These results establish that phylogenetic structural constraints can be learned implicitly through unconditional discrete diffusion, suggesting a viable path toward generative models of tumor evolution.
Annals of Surgical Oncology · 2026-04-01
articleOpen accessAnnals of Surgical Oncology · 2026-03-13
articleJournal of Computational Biology · 2025-03-06 · 2 citations
articleOpen accessSenior authorClonal lineage inference ("tumor phylogenetics") has become a crucial tool for making sense of somatic evolution processes that underlie cancer development and are increasingly recognized as part of normal tissue growth and aging. The inference of clonal lineage trees from single-cell sequence data offers particular promise for revealing processes of somatic evolution in unprecedented detail. However, most such tools are based on fairly restrictive models of the types of mutation events observed in somatic evolution and of the processes by which they develop. The present work seeks to enhance the power and versatility of tools for single-cell lineage reconstruction by making more comprehensive use of the range of molecular variant types by which tumors evolve. We introduce Sc-TUSV-ext, an integer linear programming-based tumor phylogeny reconstruction method that, for the first time, integrates single nucleotide variants, copy number alterations, and structural variations into clonal lineage reconstruction from single-cell DNA sequencing data. We show on synthetic data that accounting for these variant types collectively leads to improved accuracy in clonal lineage reconstruction relative to prior methods that consider only subsets of the variant types. We further demonstrate the effectiveness of real data in resolving clonal evolution in the presence of multiple variant types, providing a path toward more comprehensive insight into how various forms of somatic mutability collectively shape tissue development.
Computationally Reconstructing the Evolution of Cancer Progression Risk
Lecture notes in computer science · 2025-10-31
book-chapterSenior authorbioRxiv (Cold Spring Harbor Laboratory) · 2025-01-27
preprintOpen accessSenior authorMotivation: Reconstructing clonal lineage trees ("tumor phylogenetics") has become a core tool of cancer genomics. Earlier approaches based on bulk DNA sequencing (DNA-seq) have largely given way to single-cell DNA-seq (scDNA-seq), which offers far greater resolution for clonal substructure. Available data has lagged behind computational theory, though. While single-cell RNA-seq (scRNA-seq) has become widely available, scDNA-seq is still sufficiently costly and technically challenging to preclude routine use on large cohorts. This forces difficult tradeoffs between the limited genome coverage of scRNA-seq, limited availability of scDNA-seq, and limited clonal resolution of bulk DNA-seq. These limitations are especially problematic for studying structural variations and focal copy number variations that are crucial to cancer progression but difficult to observe in RNA-seq. Results: We develop a method, TUSV-int, combining advantages of these various genomic technologies by integrating bulk DNA-seq and scRNA-seq data into a single deconvolution and phylogenetic inference computation while allowing for single nucleotide variant (SNV), copy number alteration (CNA) and structural variant (SV) data. We accomplish this by using integer linear programming (ILP) to deconvolve heterogeneous variant types and resolve them into a clonal lineage tree. We demonstrate improved deconvolution performance over comparative methods lacking scRNA-seq data or using more limited variant types. We further demonstrate the power of the method to better resolve clonal structure and mutational histories through application to a previously published DNA-seq/scRNA-seq breast cancer data set. Availability: The source code for TUSV-int is available at https://github.com/CMUSchwartzLab/TUSV-INT.git.
Journal of Clinical Oncology · 2025-05-28
article4063 Background: CDK4/6 and Cyclin D1 are highly expressed in GEA cancers, suggesting that CDK4/6 inhibition may be a promising strategy. In vitro and in vivo studies have shown that abemaciclib (A) demonstrates potent antitumor efficacy in GEA by directly inhibiting this pathway. Currently, ramucirumab (RAM) ± paclitaxel is an approved 2 nd line treatment for metastatic GEA cancers. Methods: This multicenter, open-label, phase I/II study investigated the safety and efficacy of A combined with RAM in pretreated advanced GEA (2 nd or 3 rd line). The primary objective was to describe the safety profile of A (150mg po bid) and RAM (8mg/kg iv every 2 weeks) using CTCAE version 4.03. Secondary objectives included assessing the objective response rate (ORR), disease control rate (DCR), median progression-free survival (mPFS), and median overall survival (mOS). Correlative studies to evaluate alterations in CDK4/6 and Cyclin D1 as determined by next generation sequencing as predictive biomarkers of efficacy were performed. Results: From July 2021 to December 2024, 20/30 patients were enrolled. The study was terminated prematurely due to slow accrual. The median age was 61.5 years (Range: 30.0, 80.0) and most patients were male (18/20). Seven patients (35%) were HER-2 positive, 11/18 patients (61.1%) were PDL1 CPS > 1 and 15 patients (75%) had cancer localized in the E. Baseline ECOG performance status was 0 in 8 patients (40%) and 55% of patients had received prior immunotherapy with 1 st line chemotherapy. A combined with RAM was generally well-tolerated without unexpected toxicities. The most common treatment-related adverse events (AEs) were anemia (10%), hypertension (10%), and dysphagia (10%). Treatment-related AEs ≥ grade 3 occurred in 50% of the patients. Median PFS and mOS were 2.7 months (95% CI: 1.5 - 14.5) and not reached (NR) (95% CI: 3.4 - NR), respectively. ORR was 10% (2/20) and DCR was 40% (8/20). In evaluable patients, 64.7% (11/17) patients with baseline tissue CDK4/6 pathway alterations trended towards longer mPFS (3.4 vs. 1.3 months; HR:1.1) and mOS (NR vs. 5.2 months; HR: 1.4) compared to patients without alterations (p > 0.05). Notably, one study patient with a CDK6 amplification had a partial response of 64% and has been on treatment for > 24 months. Conclusions: A plus RAM demonstrated promising antitumor activity in previously treated E/GEJ adenocarcinomas in the 2 nd and 3 rd line metastatic setting with manageable toxicities. Alterations in the CDK4/6 and Cyclin D1 pathways appear to enrich for efficacy and may be predictive but need future validation. In-depth molecular studies investigating changes in the expression of selected serum/tissue genomic markers of response for the cytostatic regimen will be presented at the meeting. Clinical trial information: NCT04921904 .
Journal of Molecular Diagnostics · 2025-06-25 · 6 citations
articleOpen accessBlood collection, plasma processing, and cell-free DNA (cfDNA) purification were optimized to capture circulating tumor DNA without blood cell background DNA among 874 patients with cancer. cfDNA comprised predominantly mononucleosomal fragments [n = 874; mean (x¯) ± SD = 166 ± 5 bp] that generated comparably sized sequencing reads (x¯ ± SD = 162 ± 25 bp). Despite a vast range of cfDNA concentrations (0.50 to 1132.9 ng/mL) across 21 tumor types, matched tumor and blood specimens (n = 430 patients) revealed high concordance for coding (median = 97%) and clinical oncogenic mutations (median = 88% concordance). Therapeutically actionable mutations were identified in 233 patients by both assays, whereas 126 patients had oncogenic mutations without an established pharmacotherapeutic agent. An additional 48 patients (11%) had actionable mutations detected only in cfDNA assays, whereas 23 patients (5%) had mutations in tumor only. Concordance was high in both prevalent (lung, breast, and colon) and rare tumors (appendiceal, sarcoma). Cell-free DNA levels from diagnostic blood specimens were a strong indicator of patient survival duration independent of age, sex, tumor type, and stage, demonstrative of a potentially important role as a prognostic biomarker. Mutations in established oncogenes and tumor suppressors were readily detectable across all tumor types in circulating tumor DNA, indicating a diagnostic role for cfDNA from blood extending beyond the identification of companion therapeutics to patient screening and monitoring.
Abstract 3695: A mechanistic model of ctDNA shedding and its relevance for clinical interpretation
Cancer Research · 2025-04-21
articleAbstract Circulating tumor DNA (ctDNA) in plasma is a key biomarker of tumor dynamics, yet the quantitative relationship between ctDNA levels and clinical variables—such as tumor volume, vascularization, and cell death—remains unclear. The release of ctDNA is thought to depend on cell death mechanisms and the transport of extracellular tumor DNA (excDNA) to blood vessels. We hypothesized that ctDNA concentrations are modulated by the interplay between proximity to blood vessels and excDNA transport kinetics. To test this hypothesis, we develop a spatial mechanistic model to first predict hypoxia-mediated cell death based on blood vessel distribution and then assess the amount of excDNA reaching blood vessels after release, incorporating diffusion and degradation parameters within empirically plausible ranges. In our model, we find that ctDNA concentration in plasma depends on both vascularization and transport dynamics. For example, when diffusion dominates, ctDNA yield is high, and increasing hypoxic cell death from 0 to 80% leads to a tenfold increase in ctDNA. When degradation dominates, ctDNA yield is much lower and the same change in hypoxia results in a tenfold decrease. When both terms are relevant, ctDNA depends nonlinearly on vascularization. These findings suggest that the relationship between blood vessel proximity and ctDNA is highly sensitive to excDNA transport dynamics, underscoring the importance of experimental validation of key parameters, which may vary significantly according to site and release mechanism. Citation Format: Thomas Rachman, Patrick Wagner, William LaFramboise, David Bartlett, Russell Schwartz, Oana Carja. A mechanistic model of ctDNA shedding and its relevance for clinical interpretation [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 3695.
Recent grants
Integrated Interdisciplinary, inter-university PhD Program Computational Biology (T32)
NIH · $4.4M · 2009–2024
Reconstructing mechanisms of somatic variation in diverse cellular lineages
NIH · $1.4M · 2020–2024
SGER: Discrete Event Simulation of Self-Assembly Kinetics
NSF · $100k · 2003–2004
NIH · $1.4M · 2015
Generalizing Haplotype Models for Phylogenetics
NSF · $647k · 2006–2011
Frequent coauthors
- 64 shared
Alejandro A. Schäffer
National Institutes of Health
- 59 shared
E. Michael Gertz
- 57 shared
Kerstin Heselmeyer‐Haddad
- 57 shared
Thomas Ried
National Cancer Institute
- 47 shared
Philip R. LeDuc
Université Grenoble Alpes
- 43 shared
Salim A. Chowdhury
Carnegie Mellon University
- 39 shared
Darawalee Wangsa
- 31 shared
Nancy Lan Guo
West Virginia University
Labs
Education
- 2000
Ph.D. in Computer Science, Electrical Engineering and Computer Science
Massachusetts Institute of Technology
- 1996
M.Eng. in Electrical Engineering and Computer Science, Electrical Engineering and Computer Science
Massachusetts Institute of Technology
- 1996
B.S. in Computer Science and Engineering, Electrical Engineering and Computer Science
Massachusetts Institute of Technology
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Russell Schwartz
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup