![](https://ds.dfci.harvard.edu/wp-content/uploads/2025/02/khansen.jpg)
HSPH Biostatistics and DFCI Data Science Colloquium
Thursday, March 6, 2025
4:00pm
Harvard TH Chan School of Public Health, FXB G13
Kasper Hansen, PhD
Associate Professor, McKusick-Nathans Insitute of Genetic Medicine, Department of Biostatistics, Johns Hopkins University
A significant barrier to progress in biomedical data science is the development of prediction models that work across contexts such as different instruments, facilities or hospitals. This is particularly difficult for predictions based on genomics data. Here, we present an example of a generalizable prediction model.
The cell cycle is a highly conserved, continuous process which controls faithful replication and division of cells. Single-cell technologies have enabled increasingly precise measurements of the cell cycle both as a biological process of interest and as a possible confounding factor. Despite its importance and conservation, there is no universally applicable approach to infer position in the cell cycle with high-resolution from single-cell RNA-seq data.
Here, we present tricycle, an R/Bioconductor package, which addresses this challenge by leveraging key features of the biology of the cell cycle, the mathematical properties of principal component analysis of periodic functions, and the use of transfer learning. We estimate a cell-cycle embedding using a fixed reference dataset and project new data into this reference embedding, an approach that overcomes key limitations of learning a dataset-dependent embedding. Tricycle then predicts a cell-specific position in the cell cycle based on the data projection. The accuracy of tricycle compares favorably to gold-standard experimental assays, which generally require specialized measurements in specifically constructed in vitro systems. Using internal controls which are available for any dataset, we show that tricycle predictions generalize to datasets with multiple cell types, across tissues, species, and even sequencing assays.