Summary
This paper describes HTAN's cloud-based infrastructure for integrating and analyzing large-scale, multimodal cancer datasets at scale. Clinical and assay metadata are transformed into aggregate Google BigQuery tables hosted through ISB-CGC, with two key innovations: a provenance-based ID system that simplifies cohort construction and cross-assay integration, and a novel adaptation of BigQuery's geospatial functions for spatial biology — enabling neighborhood and correlation analysis of tumor microenvironments. Demonstrated through R and Python notebooks for single-cell, spatial, and clinical use cases.
Summary
A pan-cancer spatial analysis of key immune biomarkers — CD8, FOXP3, PD-1, and PD-L1 — using multiplex immunofluorescence performed prospectively in a clinical setting on 2,019 tumors across 14 cancer types. By integrating compositional and spatial metrics, the study identifies conserved patterns of tumor immune microenvironment (TIME) variation across cancer types and stages, and uncovers new links between spatial immune organization and tumor, genomic, and clinical features. An accompanying database of 39.4 million spatially resolved cells is provided as a resource for the cancer immunology community.
Summary
This study prospectively applied ImmunoProfile — a clinical workflow integrating automated multiplex immunofluorescence, digital slide imaging, and machine learning-assisted scoring — to 2,023 unselected cancer patients over three years. High numbers of intratumoral CD8+ or PD-1+ cells were significantly associated with lower risk of death across major cancer types, independent of clinical stage and treatment regimen, establishing the clinical value of routine immune biomarker quantification in a pan-cancer setting.
Summary
This review describes how the HTAN Data Coordinating Center has made data from the first phase of HTAN openly available — comprising 8,425 biospecimens from 2,042 research participants profiled with more than 20 molecular assays. The paper covers data standards, cloud infrastructure, governance, and community engagement strategies. HTAN data can be accessed through the HTAN Portal, CellxGene, Minerva, cBioPortal, and the NCI Cancer Research Data Commons, with infrastructure built on the Synapse platform.
Summary
MatchMiner-AI is an open-source clinical trial matching platform trained entirely on synthetic EHR data — enabling privacy-preserving AI development and open sharing of model weights. The system extracts clinical criteria from longitudinal EHR notes, embeds patient summaries and trial target populations in a shared vector space for rapid retrieval, then applies custom text classifiers to assess patient-trial fit. In retrospective evaluations on real EHR data, 90% of the top 20 recommended trials were relevant for trial-enrolled patients, compared to 17% for baseline approaches.
Summary
A comprehensive review of the current state of AI in oncology, with a specific focus on clinical integration. AI applications are organized by cancer type and clinical domain — covering detection, diagnosis, and treatment across imaging, genomics, and medical records — for the four most common cancers. The review concludes with an assessment of key challenges, evolving solutions, and future directions as AI matures from research into direct clinical practice.
Summary
This pilot study combined AI with MatchMiner to identify cancer patients at the moment they were most likely to need new treatment. Neural networks analyzed radiology reports to flag patients likely to start new systemic therapy, then linked those patients to genomically matched trials via MatchMiner. The AI reduced the volume of patient-trial matches requiring manual review by 95%, enabling an oncology nurse navigator to efficiently surface candidates for nine early-phase trials in real time.
Summary
MatchMiner is an open-source computational platform that matches genomically profiled cancer patients to precision medicine clinical trials. Deployed at Dana-Farber Cancer Institute, the platform curated 354 trials over five years and facilitated 166 trial consents — helping patients enroll an average of 55 days (22%) faster than through conventional means.