Design of Phase III Studies

Data Sciences Training Session

Design of Phase III Studies, Part II

Robert Gray, PhD
Professor, Department of Data Science
Dana-Farber Cancer Institute

NEJM Statistical Guidelines for Authors: Under the Hood

Data Sciences Training Session

David Harrington, PhD
Professor, Department of Data Sciences
Dana-Farber Cancer Institute

Collaborative Grant Writing and Statistical Methods for Grants

Data Sciences Training Session

Rebecca Gelman, PhD
Associate Professor, Department of Data Sciences
Dana-Farber Cancer Institute

The Clinical Impact of Genomics in the Pediatric Oncology

DFCI Genomics Meetup

Katherine Janeway, MD
Pediatric Oncology
Dana-Farber Cancer Institute

Pizza is provided.

 

Alternatives to Hazard Ratio for Quantifying Treatment Effect on Time-to-Event Outcomes

Data Science Training Session

Hajime Uno
Assistant Professor, Department of Data Science
Dana-Farber Cancer Institute

kelly street etai jacob image

Introduction to single cell RNA-seq data analysis for statisticians

Data Science Training Session

Kelly Street, Research Fellow
Etai Jacob, Research Fellow

In this short course, we will introduce some of the most widely used tools for single cell analysis. We will describe common experimental methods utilized in the community to generate single cell RNA-seq data and demonstrate modern pre-processing and analysis pipelines. In addition, we will discuss potential problems which frequently show up in analysis and ways to deal with them. After our first meeting, participants will be encouraged to practice some of the workflows we will share with them in order to discuss issues they encountered during the second meeting.

kelly street etai jacob image

Introduction to single cell RNA-seq data analysis for statisticians

Data Science Training Session

Kelly Street, Research Fellow
Etai Jacob, Research Fellow

In this short course, we will introduce some of the most widely used tools for single cell analysis. We will describe common experimental methods utilized in the community to generate single cell RNA-seq data and demonstrate modern pre-processing and analysis pipelines. In addition, we will discuss potential problems which frequently show up in analysis and ways to deal with them. After our first meeting, participants will be encouraged to practice some of the workflows we will share with them in order to discuss issues they encountered during the second meeting.

COVID-19 Data Science Zoomposium

Caroline Buckee, Department of Biostatistics
Harvard TH Chan School of Public Health
How do we predict the pandemic?

Michael Mina, Department of Epidemiology
Harvard TH Chan School of Public Health
The importance and challenges of testing for COVID-19

Natalie Dean, Department of Biostatistics, University of Florida
How we evaluate the efficacy of potential therapies and vaccines

Alexis Madrigal, The Atlantic
Journalism in the time of COVID-19

Moderated by Rafael Irizarry

Sponsored by the Department of Data Sciences, Dana-Farber Cancer Institute
and the Brown Institute for Media Innovation, Columbia University

noah simon

Reframing proportional-hazards modeling for large time-to-event datasets with applications to deep learning

Frontiers in Biostatistics Seminar

Noah Simon, Ph.D.
Associate Professor
Department of Biostatistics
University of Washington

To build inferential or predictive survival models, it is common to assume proportionality of hazards and fit a model by maximizing the partial likelihood. This has been combined with non-parametric and high dimensional techniques, eg. spline expansions and penalties, to flexibly build survival models.

New challenges require extension and modification of that approach. In a number of modern applications there is interest in using complex features such as images to predict survival. In these cases, it is necessary to connect more modern backends to the partial likelihood (such as deep learning infrastructures based on eg. convolutional/recurrent neural networks). In such scenarios, large numbers of observations are needed to train the model. However, in cases where those observations are available, the structure of the partial likelihood makes optimization difficult (if not completely intractable).

In this talk we show how the partial likelihood can be simply modified to easily deal with large amounts of data. In particular, with this modification, stochastic gradient-based methods, commonly applied in deep learning, are simple to employ. This simplicity holds even in the presence of left truncation/right censoring, and time-varying covariates. This can also be applied relatively simply with data stored in a distributed manner.

Streamlined empirical Bayes estimation for contextual bandits with applications in mobile health

Frontiers in Biostatistics Webinar

Marianne Menictas
Postdoctoral Fellow, Department of Statistics
Harvard University

Mobile health (mHealth) technologies are increasingly being employed to deliver interventions to users in their natural environments. With the advent of increasingly sophisticated sensing devices (e.g., GPS) and phone-based EMA, it is becoming possible to deliver interventions at moments when they can most readily influence a person’s behavior. For example, for someone trying to increase physical activity, moments when the person can be active are critical decision points when a well-timed intervention could make a difference. The promise of mHealth hinges on the ability to provide interventions at times when users need the support and are receptive to it. Thus, our goal is to learn the optimal time and intervention for a given user and context. A significant challenge to learning is that there are often only a few opportunities per day to provide treatment. Additionally, when there is limited time to engage users, a slow learning rate can pose problems, potentially raising the risk that users will abandon the intervention. To prevent disengagement, a learning algorithm should learn quickly in spite of noisy measurements. To accelerate learning, information may be pooled across users and time in a dynamic manner, combining a contextual bandit algorithm with a Bayesian random effects model for the reward function. As information accumulates, however, tuning user and time specific hyperparameters becomes computationally intractable. In this talk, we focus on solving this computational bottleneck.