Recent Publications

Gibbs DL, Pozhidayeva D, Katariya Y, Aguilar B, Anton K, Lau C, de Bruijn I, Lash A, Altreuter J, Schultz N, Cerami E, Thorsson V, et al. · Research Square preprint, 2025
Summary This paper describes HTAN's cloud-based infrastructure for integrating and analyzing large-scale, multimodal cancer datasets at scale. Clinical and assay metadata are transformed into aggregate Google BigQuery tables hosted through ISB-CGC, with two key innovations: a provenance-based ID system that simplifies cohort construction and cross-assay integration, and a novel adaptation of BigQuery's geospatial functions for spatial biology — enabling neighborhood and correlation analysis of tumor microenvironments. Demonstrated through R and Python notebooks for single-cell, spatial, and clinical use cases.
Lindsay JR, Altreuter J, Alessi JV, Weirather JL, Giobbie-Hurder A, Dryg I, Hoebel K, Sharma B, Felt K, Hodi FS, Lindeman N, Sholl LM, Cerami E, Nowak JA, Awad MM, Rodig SJ, Lotter W. · Cell Reports Medicine, 2025
Summary A pan-cancer spatial analysis of key immune biomarkers — CD8, FOXP3, PD-1, and PD-L1 — using multiplex immunofluorescence performed prospectively in a clinical setting on 2,019 tumors across 14 cancer types. By integrating compositional and spatial metrics, the study identifies conserved patterns of tumor immune microenvironment (TIME) variation across cancer types and stages, and uncovers new links between spatial immune organization and tumor, genomic, and clinical features. An accompanying database of 39.4 million spatially resolved cells is provided as a resource for the cancer immunology community.
Alessi JV, Lindsay JR, Giobbie-Hurder A, Sharma B, Felt K, Kumari P, Mazor T, Cerami E, Lotter W, Altreuter J, Weirather J, Dryg I, et al. · JCO Precision Oncology, 2025
Summary This study prospectively applied ImmunoProfile — a clinical workflow integrating automated multiplex immunofluorescence, digital slide imaging, and machine learning-assisted scoring — to 2,023 unselected cancer patients over three years. High numbers of intratumoral CD8+ or PD-1+ cells were significantly associated with lower risk of death across major cancer types, independent of clinical stage and treatment regimen, establishing the clinical value of routine immune biomarker quantification in a pan-cancer setting.
de Bruijn I, Nikolov M, Lau C, Clayton A, Gibbs DL, Lash A, Altreuter J, Schultz N, Cerami E, Eddy JA, et al. · Nature Methods, 2025
Summary This review describes how the HTAN Data Coordinating Center has made data from the first phase of HTAN openly available — comprising 8,425 biospecimens from 2,042 research participants profiled with more than 20 molecular assays. The paper covers data standards, cloud infrastructure, governance, and community engagement strategies. HTAN data can be accessed through the HTAN Portal, CellxGene, Minerva, cBioPortal, and the NCI Cancer Research Data Commons, with infrastructure built on the Synapse platform.
Altreuter J, Trukhanov P, Paul MA, Hassett MJ, Riaz IB, Mallaber E, Klein HR, Gungor G, D'Eletto M, Van Nostrand SC, Provencher J, Mazor T, Cerami E, Kehl KL, et al. · arXiv preprint, 2024
Summary MatchMiner-AI is an open-source clinical trial matching platform trained entirely on synthetic EHR data — enabling privacy-preserving AI development and open sharing of model weights. The system extracts clinical criteria from longitudinal EHR notes, embeds patient summaries and trial target populations in a shared vector space for rapid retrieval, then applies custom text classifiers to assess patient-trial fit. In retrospective evaluations on real EHR data, 90% of the top 20 recommended trials were relevant for trial-enrolled patients, compared to 17% for baseline approaches.
Lotter W, Hassett MJ, Schultz N, Kehl KL, Van Allen EM, Cerami E. · Cancer Discovery, 2024
Summary A comprehensive review of the current state of AI in oncology, with a specific focus on clinical integration. AI applications are organized by cancer type and clinical domain — covering detection, diagnosis, and treatment across imaging, genomics, and medical records — for the four most common cancers. The review concludes with an assessment of key challenges, evolving solutions, and future directions as AI matures from research into direct clinical practice.
Kehl KL, Mazor T, Trukhanov P, Lindsay J, Galvin MR, Farhat KS, McClure E, Giordano A, Gandhi L, Schrag D, Hassett MJ, Cerami E. · JCO Precision Oncology, 2024
Summary This pilot study combined AI with MatchMiner to identify cancer patients at the moment they were most likely to need new treatment. Neural networks analyzed radiology reports to flag patients likely to start new systemic therapy, then linked those patients to genomically matched trials via MatchMiner. The AI reduced the volume of patient-trial matches requiring manual review by 95%, enabling an oncology nurse navigator to efficiently surface candidates for nine early-phase trials in real time.
Klein H, Mazor T, Siegel E, Trukhanov P, Ovalle A, Kumari P, Hansel J, Lindsay J, Cerami E, et al. · npj Precision Oncology, 2022
Summary MatchMiner is an open-source computational platform that matches genomically profiled cancer patients to precision medicine clinical trials. Deployed at Dana-Farber Cancer Institute, the platform curated 354 trials over five years and facilitated 166 trial consents — helping patients enroll an average of 55 days (22%) faster than through conventional means.