Picture of CURE interns from summer 2025

CURE Program aligns Local High School Students with Data Science

Written by Anne O’Neill

The Continuing Umbrella of Research Experiences (CURE) Program is a full-time, paid, rigorous 7-11 week summer research internship for students interested in pursuing a career in scientific research. Four CURE interns have been a part of Data Science (DS) this summer:

Yasmine Karibe, matched with PI Cheng-Zhong Zhang, mentor Jiahui Zhang:
“This summer I’ve been working alongside my mentor on designing a dual gRNA CRISPR/Cas9 lentiviral vector to induce targeted X-autosome chromosome fusions in a human cell line. I’ve enjoyed learning hands on lab skills and learning how to discuss and explain my newly obtained knowledge. I’m a rising senior at Burlington High School and aspire to go into a health science related major for undergrad next year!”

Juns Ye, matched with PI Sahand Hormoz, mentor Guoye Guan:
“This summer I have been working alongside my mentor, Guoye Guan, on searching for distinct developmental patterns for in vitro human embryoids through biophysical modeling and parameter analysis in MATLAB. From this project, I learned how to integrate physical foundations and mathematical models into biology, such as using force to model cell-to-cell interaction. I enjoyed working in my lab as I got to acquire new knowledge, watch and shadow experimentalists, and being able to contribute to projects! I am a rising senior at Boston Latin School, and I aspire to pursue a double concentration of physics and biology for my undergraduate studies!”

Derek Yin, matched with PI Giovanni Parmigiani, mentor Adolphus Wagala:
“This summer, I worked with my mentor, Dr. Wagala, to study Bayesian Boolean Matrix Factorization for Acute Myeloid Leukemia. I greatly enjoyed learning from the authentic research experience and improving my ability to communicate and interpret scientific findings. I’m a rising high school junior at Nobles and Greenough. In the future, I hope to pursue a career in the health sciences by majoring in Molecular Biology.”

Elizabeth Govani, matched with PIs Giovanni Parmigiani and Danielle Braun, mentor Maria Sol Rosito:
“This summer, I worked in the BayesMendel Lab on a project focused on cancer risk modeling and gene-cancer associations in hereditary cancer syndromes. Specifically, I analyzed and reassessed penetrance estimates for CDH1 and MUTYH mutation carriers to see how updated data might improve my lab’s risk prediction tool, Fam3Pro. I’m a rising senior at Newton South High School and a participant in the YES for CURE program, a longer subprogram of the CURE program. In the future, I hope to attend one of my top realistic dream schools, either Boston University or North Carolina A&T, and continue pursuing a career that blends math and healthcare.”

They will present their research in a DFCI CURE poster session on August 8th, which concludes this summer’s program.

More information on the CURE or YES for CURE program:
https://www.dfhcc.harvard.edu/research/cancer-disparities/students/student-overview

Picture of CURE interns from summer 2025

Left to right: Juns Ye, Elizabeth Govani, Yasmine Karibe, Derek Yin

Jonathan Larson, PhD joins Data Science as new Director of Training and Education

The Department of Data Science is excited to announce the appointment of Dr. Jonathan Larson as the new Director of Training and Education, effective June 2, 2025. He will hold a joint appointment with the Department of Biostatistics at the Harvard TH Chan School of Public Health. Dr. Larson brings a wealth of expertise, a passion for teaching, and a commitment to fostering success to his new role.

Dr. Larson holds a Master’s and PhD in Biostatistics from Harvard University. Prior to joining our department, he spent three years teaching in the Department of Mathematics and Statistics at UMass Amherst, where he honed his skills in delivering rigorous, student-centered education. His teaching philosophy emphasizes evidence-based methods, active learning, and creating an environment where learners feel empowered to tackle complex challenges.

Beyond his academic credentials, Dr. Larson’s career is distinguished by diverse and impactful experiences. He has worked at a residential treatment center in Memphis and served as a Peace Corps Volunteer in Senegal. These experiences have shaped his approach to education, underscoring the importance of building strong, supportive relationships with students and colleagues alike.

Dr. Larson plans to develop training programs on longitudinal data analysis and survival analysis as well as explore innovative teaching methods at HSPH this fall. Looking ahead, his long-term vision includes strengthening connections within the department and ensuring that training initiatives remain relevant and impactful. Dr. Paul Catalano, Associate Chair of the Data Science, noted, “Jonathan’s appointment brings us to the next level in data science education at multiple levels—his expertise will be instantly apparent and his contributions to the department and the Institute will be far-reaching. We are so thrilled to have Jonathan join us!”

Dr. Larson is eager to engage with students, faculty, and the broader community. He welcomes collaboration and encourages colleagues to reach out via email or visit his office to discuss ideas and opportunities.

We are thrilled to welcome Dr. Larson to the Department of Data Science and are confident that he will make a lasting, positive impact on our educational programs and community.

Call for Abstracts: DF/HCC Early Career Investigators Symposium

2025 Dana-Farber/Harvard Cancer Center Celebration of Early Career Investigators
November 4, 2025 from 1:00-5:30PM
In-Person at Dana Farber Cancer Institute
Yawkey Conference Center

Do you work in population science, including epidemiology, biostatistics, outcomes, diversity, cancer care delivery research, and early detection? We invite students, postdocs, residents, and clinical fellows to submit abstracts for consideration as a short talk or as part of the PATHFINDER sponsored poster presentation.

https://bit.ly/ecis2025abstract
Submit your abstract by September 19

2025 KEYNOTE SPEAKER
Jane J Kim
Dean for Academic Affairs at the Harvard T.H. Chan School of Public Health
Visit Profile

 

Registration for the public now open at https://bit.ly/ECIS2025

Deeper Differential Expression Analysis with Shrinkage Correction

 

HBC Current Topics in Bioinformatics
Jul 16, 2025 01:00 PM
Register here.
February’s R Basics, or a working knowledge of R, is a prerequisite for this workshop.
Jared Brown, PhD
Postdoctoral Research Fellow, Irizarry Lab
DFCI Data Science
Differential expression (alternately abundance) analysis is regularly a core tool in identifying and quantifying differences between and across groups in -omics data. In this workshop session with follow-along analysis scripts we will take a deeper look at the models underlying differential expression analysis with the particular example being the DESeq2 framework. We will examine questions around design specification, the proper use of pre-computed offsets like normalization corrections, parameter estimation and testing, and robust false discovery rate correction through post-hoc shrinkage. Examples highlighting how these approaches differ across datasets will be drawn from bulk RNAseq, single-cell RNAseq, and ChIPseq.

Complex Disease Modeling And Efficient Drug Discovery With Large Language Models

HSPH Biostatistics and DFCI Data Science Seminar
Tuesday April 29 from 11:00-12:00pm
Zoom only (Link to be posted shortly)

Yu Li, PhD
Assistant Professor, CSE
The Chinese University of Hong Kong

Large language models, which can integrate and process large amounts of data in biomedicine, have great potential in modeling complex diseases and discovering functional biomolecules for potential therapeutics. To model complex diseases and identify the potential drug targets for such diseases, we built a language model trained on the insurance claims of around 123 million US people. With the model, we can give a unified representation of all the common complex diseases, which enables us to predict the genetic parameters of the diseases and discover unique genetic loci related to them efficiently. Then, we developed models based on protein language models to efficiently discover remote homologs and functional biomolecules from nature, such as signal peptides and antimicrobial peptides. With the model, we can identify remote homologs 22 times faster than PSI-BLAST and discover diverse functional peptides with sequence similarity lower than 20% against the known ones. Finally, we developed an RNA language model to model the RNA sequence and structure relation, which enables us to perform RNA structure prediction and reverse design effectively. Within two months, we designed and experimentally validated 19 RNA aptamers that are structurally similar, yet sequence dissimilar, to known light-up aptamers. More importantly, 10 designed aptamers show higher fluorescence than the native Mango-I. The above projects demonstrate the great potential of large language models in promoting fundamental computational biological research and potential transformational development.

Modeling Multiscale Genome and Cellular Organization

HSPH Biostatistics and DFCI Data Science Seminar
Tuesday April 15 at 4:00pm
Dana-Farber Cancer Institute
Center for Life Sciences Building, 11th Floor, Room 11081

Jian Ma, PhD
Ray and Stephanie Lane Professor of Computational Biology
Carnegie Mellon University

 

The intersection of Al/ML and biomedicine is entering a transformative era, with growing potential to
impact both basic research and translational medicine. Yet, despite remarkable advances in high-
throughput technologies across genomics and cell biology, our understanding of the diverse cell types
in the human body and the underlying principles of intracellular molecular organization and
intercellular spatial interactions remains incomplete. A central challenge lies in developing
computational frameworks that can integrate molecular, cellular, and tissue-level data to advance cell
biology at an unprecedented scale. In this talk, I will present our recent work on machine learning
approaches for regulatory genomics, with a focus on single-cell 3D epigenomics. We introduce methods
that connect different layers of 3D genome architecture and cellular function at single-cell resolution,
including graph- and hypergraph-based models that capture spatial genome organization. I will also
highlight our latest efforts in developing self-supervised learning frameworks to delineate multiscale
cellular interactions within complex tissues, enabling the discovery of previously unrecognized spatially
organized patterns. Together, these Al-driven models provide a foundation for integrative, multiscale
representations of cellular systems, offering new insights into genome structure, gene regulation, and
cell-cell communication. This line of work opens new opportunities toward building cohesive multiscale
cellular models applicable across a broad range of contexts in health and disease.

Fréchet Regression of Random Objects on Vector Covariates and Its Applications for Single Cell RNA-seq Data Analysis

HSPH Biostatistics and DFCI Data Science Colloquium
Thursday, April 3, 2025
4:00pm
Harvard TH Chan School of Public Health, FXB G13

Hongzhe Li, PhD
Perelman Professor of Biostatistics, Epidemiology and Informatics
Director, Center for Statistics in Big Data Vice Chair for Research Integration, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania

Population-level single-cell RNA-seq data captures gene expression profiles across thousands of cells from each individual in a sizable cohort. This data facilitates the construction of cell-type- and individual-specific gene co-expression networks by estimating covariance matrices. Investigating how these co-expression networks relate to individual-level covariates provides critical insights into the interplay between molecular processes and biological or clinical traits. This talk introduces Fréchet regression, modeling covariance matrices as outcomes and vector covariates as predictors, using the Wasserstein distance between covariance matrices as a metric instead of the Euclidean distance. A test statistic is proposed based on the Fréchet mean and covariate-weighted Fréchet mean, with its asymptotic null distribution derived. Analysis of large-scale single-cell RNA-seq data reveals an association between the co-expression network of genes in the nutrient-sensing pathway and age, highlighting perturbations in gene co-expression networks with aging. Additionally, a robust local Fréchet regression approach, leveraging neural unbalanced optimal transport, is briefly discussed to explore how cells are temporally organized during the differentiation of human embryonic stem cells into embryoid bodies.

2025 Marvin Zelen Memorial Symposium

We invite you to attend the Marvin Zelen Memorial Symposium, an event that celebrates the life and contributions of a remarkable figure in the field of statistics. This symposium will be held on Friday, April 4, 2025, from 1:00 PM to 5:00 PM in the Kresge G1 Auditorium, located in the Harvard TH Chan School of Public Health at 677 Huntington Ave, Boston, MA. A reception will follow.

The Marvin Zelen Memorial Symposium is an opportunity for statisticians, researchers, and professionals in the field to come together and engage in thought-provoking discussions and presentations. We have curated an exceptional lineup of speakers who will share their expertise and insights on this year’s theme, Data-Driven Controversies in Science. The three topics discussed will be:

  • Beyond the Hype: Unveiling the Limits of Machine Learning in Science and Medicine
  • Forecasting the Future: How Accurate Are Our Climate Change Models?
  • Unraveling COVID-19: Origins, Interventions, and Lessons Learned

Full program available here: https://zelen25.my.canva.site/

To ensure your attendance, please RSVP by clicking on this link: https://bit.ly/zelen25.

2025 Speakers:

Alina Chan, PhD,Broad Institute
Auroop R. Ganguly, PhD,Northeastern University
Sayash Kapoor,Princeton University
Marc Lipsitch, DPhil, Harvard TH Chan School of Public Health
Arjun Manrai, PhD,Harvard Medical School
Gavin A. Schmidt, PhD,Climate Scientist

William Lotter Wins Prestigious Awards in Oncology and Radiology

Dr. William (Bill) Lotter, PhD, a leading researcher in artificial intelligence (AI) for medical imaging, has been honored with three prestigious accolades for his contributions to oncology and radiology.

Dr. Lotter was named one of the recipients of the Wong Family Awards in Translational Oncology for FY25 for his innovative project, “Bridging spatial biology with routine histopathology using AI to improve cancer prognostication.” His work will study the tumor microenvironment by combining spatial biology data with conventional histopathology using AI, paving the way for more accurate and actionable cancer prognoses. This award goes to advancing the careers of early career investigators as they pursue innovative projects in clinical and/or translational oncology, biotechnology development, precision medicine, or immunotherapy approaches.

In addition, Dr. Lotter has been named to the Radiology Business Forty Under 40 Class of 2024, a distinguished list highlighting emerging leaders under 40 who are driving innovation and progress in the field of radiology. The recognition underscores the impact of his efforts in using AI to enhance medical imaging and diagnostic practices.

Furthermore, Dr. Lotter has received a Trailblazer Award from the National Institute of Biomedical Imaging and Bioengineering (NIBIB) for his proposal “Improving prognosis prediction and therapy selection for cutaneous squamous cell carcinomas using artificial intelligence.His project will leverage AI to more accurately identify aggressive cutaneous squamous cell carcinomas, a type of skin cancer, to guide treatment selection. Trailblazer Awards are designed to fund early career investigators for high-risk, high-reward projects at the interface of quantitative and biomedical sciences.

Dr. Lotter is an Investigator in the Department of Data Science at the Dana-Farber Cancer Institute and Assistant Professor at the Harvard Medical School, where he focuses on integrating computational methodologies with medical imaging to improve patient outcomes. His honors showcase his dedication to bridging technology with clinical impact, where his work highlights the transformative potential of AI in medicine.

Heng Li, PhD Named International Society of Computational Biology Fellow

The International Society of Computational Biology (ISCB) welcomes Heng Li, PhD to the prestigious 2023 ISCB Fellows cohort. Dr. Li is Assistant Professor of Biomedical informatics, Dana-Farber Cancer Institute and Harvard Medical School.

Per ISCB’s release: The Fellows program was created to honor members who have distinguished themselves through outstanding contributions to the fields of computational biology and bioinformatics. Begun in 2009, 2023 marks the 14th anniversary of the program. Each year, ISCB seeks Fellows’ nominations from our members who meet the eligibility criteria for significant scientific and leadership contributions to the field of computational biology and bioinformatics.

Dr. Li is recognized for his influential tools for the processing of sequence data and his dedication to open-source software, including the detailed documentation which has permitted numerous researchers to learn and build from his work.