
HSPH Biostatistics and DFCI Data Science Colloquium
Thursday, April 3, 2025
4:00pm
Harvard TH Chan School of Public Health, FXB G13
Hongzhe Li, PhD
Perelman Professor of Biostatistics, Epidemiology and Informatics
Director, Center for Statistics in Big Data Vice Chair for Research Integration, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania
Population-level single-cell RNA-seq data captures gene expression profiles across thousands of cells from each individual in a sizable cohort. This data facilitates the construction of cell-type- and individual-specific gene co-expression networks by estimating covariance matrices. Investigating how these co-expression networks relate to individual-level covariates provides critical insights into the interplay between molecular processes and biological or clinical traits. This talk introduces Fréchet regression, modeling covariance matrices as outcomes and vector covariates as predictors, using the Wasserstein distance between covariance matrices as a metric instead of the Euclidean distance. A test statistic is proposed based on the Fréchet mean and covariate-weighted Fréchet mean, with its asymptotic null distribution derived. Analysis of large-scale single-cell RNA-seq data reveals an association between the co-expression network of genes in the nutrient-sensing pathway and age, highlighting perturbations in gene co-expression networks with aging. Additionally, a robust local Fréchet regression approach, leveraging neural unbalanced optimal transport, is briefly discussed to explore how cells are temporally organized during the differentiation of human embryonic stem cells into embryoid bodies.