Top 10 Challenges in Data Sciences Seminar

What are the Top Ten Challenges in Data Science? The newest Department of Data Sciences seminar series explores big questions in this growing field. Over 10 months, you'll gain the tools and knowledge to address the most common challenges. These seminars are open to all.

All of the seminars will take place in the Center for Life Sciences Building, 3 Blackfan Circle, 11th floor. 

Coming Up

In 2020, the Department of Data Sciences will merge our "Top 10 Challenges in Data Science" and "Data Sciences Training Sessions" seminar series. The Training Sessions will not only cover the basics of data science but also explore the challenges that we face in this growing field.

Click here to see our 2020 offerings.

Past Seminars

Slides and notes available at

October 1, 2019
Introduction to R and Rstudio
Patrick Kimes, PhD
Research Fellow, Department of Data Sciences
Dana-Farber Cancer Institute

This tutorial will be interactive. Please remember to bring your laptop! To follow along, attendees should install R and RStudio *BEFORE* the session.

October 29, 2019
Data visualization with ggplot2
Rafael Irizarry, PhD
Professor and Chair, Department of Data Sciences
Dana-Farber Cancer Institute

Previous Seminars

January 24, 2018
Discovering correlated structure between multiple data sets using matrix factorization
Aedin Culhane, PhD - Senior Research Scientist
Center for Cancer Computational Biology (CCCB)
Dana-Farber Cancer Institute

Feb 27 2018
Introduction to Technology Transfer and Intellectual Property
Jenna Matheny - Licensing Associate II
Belfer Office for Dana-Farber Innovations
Dana-Farber Cancer Institute

April 12, 2018
Wrangling cloud scale genomic data
Vincent Carey, Professor Medicine, HMS
Associate Biostatistician, Channing Laboratory, Brigham And Women's Hospital


Samuela Pollack
Software Engineer, Dana-Farber Cancer Institute

May 30, 2018
Characterizing hematopoiesis in patients with myeloproliferative neoplasms using single-cell sequencing
Sahand Hormoz, PhD
Assistant Professor, Department of Systems Biology, Harvard Medical School
Department of Biostatistics and Computational Biology, Dana-Farber
Associate Member, Broad Institute of MIT and Harvard
January 25   Implications of mobile-platform respondents for design and development of web-surveysDaniel Gundersen

February 28    Statistical Designs for Platform TrialsSteffen Ventz

March 28    Grant Management within BCBNatalie Venskus 

June 7   Methods and Tools for Adaptive TrialsCyrus Mehta

June 20   Techniques for Matched Randomization in Sequential Enrollment Trials, Jonathan Chipman 

October 25   How many mice? What on earth do I do with their data?Donna Neuberg

September 20   
Introduction to R, RStudio and the “tidyverse”, Stephanie Hicks

January 22        IPython: Big Data, Black Magic and Parallel Computing, Luca Pinello

February 12      Oncology Data Retrieval System (OncDRS), Caitlin Fontes

March 5              cBio Cancer Genomics Portal: how it works, how it's built, and where it's going next, Ethan Cerami

April 30              Bioconductor II, Michael Love

May 28               Graphs and Networks in R, Mehmet Samur

January 23             Metabolomics, Svitlana Tyekucheva

February 13           Decision Curve Analysis, Giovanni Parmigiani

March 6                   R Bioconductor, Rafael Irizarry
                                 - Installing Bioconductor
                                 - Basic Structure of Bioconductor         

April 3                     Next Generation Sequencing Pipelines, Naim Rashid

May 1                      Gene Set Enrichment Analysis, William Barry

June 12                   Propensity Score, Cory Zigler

July 24                     RStudio, Emanuele Mazzola & Su-Chun Cheng
                                   - Example      

September 25        Statistical Designs with Expansion Cohorts, Suzanne Dahlberg

October 9                Spade Clustering, Guo-Cheng Yuan

November 13          Modeling Cancer Resistance, Phillip Altrock

December 4            Beyond Proportional Hazards, Hajime Uno

March 11        Radiomics: There is more than meets the eye in Medical Imaging, Hugo Aerts, Ph.D.
Department of Radiation Oncology, Dana-Farber Cancer Institute

January 19     Functionally annotating non-protein coding risk alleles from genome wide association studies, Matthew Freedman, Assistant Professor, Department of Medicine, Harvard Medical School, Associate Physician, Dana-Farber Cancer Institute

March 1          Comparative Effectiveness Research,  Constantine Gatsonis, Henry Ledyard Goddard University Professor of Biostatistics, Brown University

March 8          Distance in gene expression from stem cells as prognostic marker, Markus Riester, Research Fellow, Dana-Farber Cancer Institute

April 4            17 months in 45 minutes: three aspects of cancer biostatistics, Emanuele Mazzola, Research Fellow, Dana-Farber Cancer Institute

April 19          Tips to reduce potential pain and fear in data analysis and reporting (with ECOG examples), Hajime Uno, Research Scientist, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute

July 9             Discovering associations that replicate from a primary study of high dimension to a follow-up study, Ruth Heller, Senior Lecturer, Industrial Engineering and Management, Techinon, Israel

August 29      Inference from whole-genome random regression models in stratified populations, Daniel Sorensen, Department of Molecular Biology and Genetics, Aarhus University, Denmark

August 31       Evaluating risk models with prospective cohorts: analyses and designs, Yingye Zheng, Fred Hutchinson Cancer Center

January 27, Disssecting and Targeting Pathologic Protein Interactions with Stapled Peptides, Loren Walensky, Dana-Farber Cancer Institute

March 31, Using Federal and State Databases for Cancer Outcomes Research, John Ayanian, Harvard Medical School

April 14, Statistical Methods for Analyzing Length-Biased Right Censored Data, Yu Shen, MD Anderson Cancer Center

January 7
        Automated High-dimensional Flow Cytometric Data Analysis; Saumyadipta Pyne, Department of Medical Oncology, Dana-Farber Cancer Institute

February 25     The use of biologic, genetic, clinical and morbidity factors to determine treatment for children with cancer; Wendy B. London, Division of Hematology/Oncology, Department of Pediatrics, DFHCC/CHB, Harvard Medical School and Children’s Oncology Group (COG) Statistics and Data Center

March 17          The Challenge of Animal Studies, Donna Neuberg, Senior Lecturer, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute

April 8              Nuts and Bolts:
                               Interactive SAS (Julie Aldridge)
                               Making Figures in R(Judi Manola)
                               Sample Size for Cox Models in Stata – Shari Gelber
                               Cool Stuff with ODS Proc Freq in SAS – Paul Catalano

April 22             Computational Genomics of Gene Regulation;  Xiaole (Shirley) Liu, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute

May 12              Developing an Information Infrastructure Focused on Cancer Comparative Effectiveness Research; David A. Fenstermacher, PhD, Chair and Executive Director, Department of Biomedical Informatics, H. Lee Moffitt Cancer Center & Research Institute

June 17             Integrative model system analysis of gene expression; David Weiss, Ph.D., Bioengineer, IRIDIA-CoDE, Belgium

July 1                Gene expression profiling and clinical decision making with breast cancer patients, Hal Burstein, Associate Professor of Medicine, Harvard Medical School and the Breast Oncology Center, Dana-Farber Cancer Institute

October 21        Estimating the risk of acquiring a blood borne pathogen as a result of hemodialysis machine contamination, Nathan Taback, University of Toronto

December 16    Evolutionary Dynamics of Cancer, Franziska Michor, Dana-Farber Cancer Institute


January 8           Analyses of Cumulative Incidence Function via Non-parametric Multiple Imputation, Ping Ruan

January 22         Issues with the design, monitoring and analysis of cancer treatment studies when most patients are “cured”, James Anderson

February 12        Evaluating Prediction and Risk Stratification in Right Censored Time-to-Event Data, Keith Betts

February 26        Confirming Non-Adherence in Clinical Trials, Robert Gray and Meredith Regan  

March 26              Age-Adjusted Median Differences in Nutritional Intake Between Black and White Males Presenting for Prostate Cancer Screening, Stuart Lipsitz

April 2                  Multiple Imputation in a Large-scale Complex Survey, Yulei He  

June 25                A Propensity Score Approach for the Analysis of Population-Based Genetic Association Studies, Nandra Mitra

July 16                 Make life easier with LyX and R: An efficient way for making an analysis report, Hajime Uno 

October 8             Breast cancer tumour growth estimated through Norwegian mammography screening data, including a short presentation of the Norwegian Cancer Registry, Harald Weedon-Fekjær, Ph.D.

October 22           Modeling risk in families with cancer, Giovanni Parmigiani


January 17           Anti-angiogenic Potential of GIPC1 Silencing in Human Disease, Tom Chittenden, PhD.

February 28         Letrozole Compared with Tamoxifen for Elderly Patients with Endocrine Responsive Early Breast Cancer in BIG 1-98: Efficacy, Treatment, and Adverse Events, Zhuoxin Sun, PhD

                             Outcomes for Elderly, Advanced-Stage Non-Small Cell Lung Cancer Patients Treated with Bevacizumab in Combination with Carboplatin and Paclitaxel: Analysis of E4599, Suzanne Dahlberg, ScD

                             Reporting of Subgroup Analyses in Clinical Trials: A Discussion, Steve Lagakos, PhD

June 5                   An Example of SNP Analysis, Haesook Kim, Ph.D.

September 25      The Role of Hormone Receptors in Cancer, Myles Brown, M.D.

November 20        BCB Speed-Dating Journal Club
                              One hour: 4 recent publications by your BCB colleagues!
                              Gefininib for Recurrent NSCLC: All things are not created equal, Suzanne Dahlberg
                              Premenopausal endocrine-responsive early breast cancer: who receives chemotherapy?, Meredith Regan
                              Dissociation of its opposing immunologic effects is critical for the optimization of antitumor CD8+ t-cell responses induced by interleukin 21, Kristen Stevenson
                              Model-based analysis of ChIP-Seq (MACS), Yong Zhang

December 11        Research and Progress of the Department of Cancer Biology: Division of Metabolism and Chronic Diseases, Dr. Bruce Spiegelman


January 18         Putting microarray data in context: Multivariate approaches for exploratory analysis of multiple biological datasets, Aedin Culhane, Ph.D.

February 22       Global View of Chromatin, Guo-Cheng Yuan, Ph.D.

April 12              Prediction of U.S. Mortality Counts Using Semiparametric Bayesian Techniques, Ram Tiwari

May 21               Where Do We Stand in Microarray Data Analysis? Lessons of the Past and Hopes for the Future, Andrei Yakovlev

June 13              Clinical Research-Strategies to Fulfill our Mission, Phillip Kantoff

September 20    Electronic Data Capture (eDC) implementation for PI Initiated Studies at the DF/HCC, Marina Nillni

October 11         Spatial analysis of cancer data incorporating residential history, Al Ozonoff, Ph.D.

October 23         PLASQ: A Generalized Linear Model-Based Procedure to Determine Allelic Dosage in Cancer Cells from SNP Array Data, David Harrington, Ph.D.

November 8       Understand Racial Disparities in Cancer Cures: Results from Large Population Science Studies, Yi Li    

December 6       Phase III Intergroup Study of Adriamycin/Taxotere vs. Adriamycin/Cytoxan in the Adjuvant Treatment of Node Positive and High-Risk Node Negative Breast Cancer: Statistical Challenge, Robert Gray