December 18 |
10-12pm |
Filippo Gambarota, Introduction to Multiverse Meta-Analysis
In this session we will introduce the concept of multiverse analysis applied to meta-analysis from an exploratory and inferential point of view. Multiverse analysis is a recently developed approach where given a certain research question and a dataset, authors conduct and report all plausible statistical analyses. Usually only one analysis is reported and the impact of other plausible alternatives (researcher’s degrees of freedom) on the final results is often neglected. We will see some exploratory statistics and plots and the implementation in R. The increase in complexity due to reporting multiple analysis results on the same dataset need not only descriptive methods but valid inferential approaches. We will present some methods for statistical inference in a multiverse analysis such as the specification curve, and the PIMA (post-selection inference in multiverse analysis) with the related R code. Knowledge of the R software, multiple testing and p-value adjustment methods is helpful but not required to attend these sessions. Attendees with biostatistics and computational biology background would find this applicable to their work. |
Hybrid
Register. |
|
1-3pm |
Jeremy Simon, Introduction to scRNA-seq and Data Preprocessing
There are many useful applications of single-cell or single-nucleus RNA-seq technologies to better understand cancer biology. In this introductory workshop, we will discuss some of the advantages of adopting single-cell approaches, frequently utilized technologies/platforms, best practices for experimental design, and provide an overview of algorithms used for data preprocessing. We will then provide an interactive demonstration of single-cell/single-nucleus RNA-seq data preprocessing with alevin-fry, resulting in a counts matrix ready for downstream analysis |
Virtual
Register through HBC |
October 3 |
12:00-1:30pm |
Anne O’Neill, INFORM Clinical Trials Database Part II (what happens before, during, after authorization from ODQ)
This INFORM session is a continuation of the session from September 12, 2024. (Attending the September 12th session is helpful but not a requirement.) You have submitted or obtained authorization to access data from ODQ for the trial for which are you the assigned statistician, now what? This session covers examples re: the collaborative process with the medical side of the research team before, during, and after obtaining the biostat authorization/access from ODQ, i.e. examples of reasons for a statistician to pull data, things to keep in mind in the collaborative process related to the reasons for pulling data, and examples of standard emails sent by ODQ (when the ‘Data Request’ is submitted by the research team) to be reviewed as a context. This training is relevant to you if are assigned to a clinical trial whose corresponding clinical data are in INFORM database. Please reach out to Anne ONeill if you have questions about the training or if it would be applicable to the work that you do.
|
Virtual
Email Anne O’Neill for Zoom link |
September 12 |
11:30-1:30pm |
Anne O’Neill, INFORM Clinical Trials Database (Introduction to Access and Authorization)
INFORM houses data from DFHCC clinical trials which utilize electronic data capture (eDC). A general overview of the INFORM Clinical Trials Database itself from statistical perspective, the set-up needed to access data, and how to obtain authorization to access data itself from the INFORM database itself is reviewed. Documentation for the available programs to interact with and pull data from INFORM database is also provided. This training is relevant to you if are assigned to a clinical trial whose corresponding clinical data are in INFORM and you have not yet verified your access or authorization. Please reach out to Anne ONeill if you have questions about the training or if it would be applicable to the work that you do.
|
Virtual
Evaluation |
May 22 |
10-12pm |
Jared Brown, Deeper Differential Expression Analysis with Shrinkage Correction
Differential expression (alternately abundance) analysis is regularly a core tool in identifying and quantifying differences between and across groups in -omics data. In this workshop session with follow-along analysis scripts we will take a deeper look at the models underlying differential expression analysis with the particular example being the DESeq2 framework. We will examine questions around design specification, the proper use of pre-computed offsets like normalization corrections, parameter estimation and testing, and robust false discovery rate correction through post-hoc shrinkage. Examples highlighting how these approaches differ across datasets will be drawn from bulk RNAseq, single-cell RNAseq, and ChIPseq. |
Hybrid, CLSB 11081
Evaluation
|
April 24 |
10-12pm |
Shawn Mims and Love Nickerson, Research Administration and You
How research administrators support you in your research. Presenters will discuss the nuts and bolts of grants, compliance, and beyond. This is intended for all who collaborate: faculty, post docs, statisticians, and computational biologists. |
Hybrid, CLSB 11081
Evaluation |
April 17 |
10-12pm |
Mark Soliman, Intro to Git and Github
We’ll go over the motivation behind version control systems (like Git) as well as some basic commands. We’ll also go over using GitHub as a repository. |
Virtual
Evaluation |
April 10 |
10-12pm |
Selvi Guharaj, DFCI – Data Science Centralized Data Access and Sharing
The session topics include:
- Overview of the DFCI – Data Science (DS) Data Catalog (i.e. genomic/bioinformatics annotation databases that are downloaded and organized (i.e. GDC, GTEx, AnVIL/Terra, GEO, HCA, HTAN, NCBI, IGSR, dbGaP, 10x, GENCODE, ENSEMBL and such).
- Forms and procedures to request access to in-house open-/controlled-access datasets under our DFCI – DS systems and to download of new open-/controlled-access datasets under DFCI – DS centralized directories or individual lab file shares
- Forms and procedures on controlled-access data access: sharing of primary data files and sharing of genomic summary data files
|
Hybrid, CLSB 11081
Evaluation |
April 3 |
10:30-12pm |
Anne O’Neill, Introduction to Clinical Trials – General Overview
The following topics are reviewed (but not limited to): various types of clinical trials, various phases of clinical trials, how are clinical trials funded, the clinical trials development process, ‘positive’ and ‘negative’ clinical trials, what types of trials are most common, roles of the different divisions within Data Science in clinical trials and related cancer projects, databases for clinical trials data, collaborations with other departments in and outside of DFCI in relation to the clinical trials and related cancer projects. |
Hybrid, CLSB 11081
Evaluation |
March 27 |
10-12pm |
Erica Holdridge, Statistics for Computational Biology Projects
Statistics is an important tool for computational biologists because it helps us quantitatively understand and analyze biological data. This interactive training session will cover an introduction to statistical concepts. Topics will include: experimental design, data cleaning, common analysis methods (logistic regression, ANOVA, multiple comparisons), and interpreting results.
|
Virtual (option of watch party)
Evaluation
Video |
March 20 |
10-12pm |
Sandra Lee, Phase II Trial Designs
We will review general study considerations for phase II trials and evaluate the study designs. Examples from phase II trials of ECOG-ACRIN studies will be presented and discussed.
|
Hybrid, CLSB 11081
Evaluation
|
March 13 |
10-12pm |
Yujie Guo, Introduction to scRNA-seq Data Analysis and Interpretation Using Seurat
Seurat is an R package designed for QC, analysis, and exploration of single-cell RNA-seq data. It has been widely used in both dry lab and wet lab settings for its robustness and scalability. In this coding workshop, we will demonstrate the workflow of Seurat, use published matched tumor-normal data, and have hands-on experience of analyzing and interpreting such information.
|
Hybrid, CLSB 11081
Evaluation
Video |
March 6 |
10-12pm |
Giovanni Parmigiani, Basics of Replicability
In this session, we will discuss the data emerging from the “manyLabs” study of replicability of cancer biology investigations, and review remedies and good statistical practices to prevent lapses of replicability. We will revisit Simpson’s paradox, Batch effects, Shrinkage and p-hacking from a replicability perspective. You will not necessarily learn new techniques but hopefully you will learn to be on the alert for pitfalls that can do a lot of damage to the quality of your work.
|
Hybrid, CLSB 11081
Evaluation |
February 29 |
10:30-12:30pm |
Nabihah Tayob, Biomarkers in Cancer Research
In this session, we will cover some important foundational topics when studying biomarkers in cancer research including how to evaluate a biomarker, the importance of the clinical context, designing a retrospective biomarker evaluation study and a discussion of biomarker-driven clinical trial design. |
Hybrid, CLSB 11081
Evaluation
Video |
February 28 |
10-12pm |
Jeremy Simon, Introduction to scRNA-seq and Data Preprocessing
There are many useful applications of single-cell or single-nucleus RNA-seq technologies to better understand cancer biology. In this introductory workshop, we will discuss some of the advantages of adopting single-cell approaches, frequently utilized technologies/platforms, best practices for experimental design, and provide an overview of algorithms used for data preprocessing. We will then provide an interactive demonstration of single-cell/single-nucleus RNA-seq data preprocessing with alevin-fry, resulting in a counts matrix ready for downstream analysis |
Hybrid, CLSB 11081
Evaluation
Video |
February 21 |
10-12pm |
Robert Shear, R and RStudio Quickstart for Data Science Professionals
For Biostatisticians, Computational Biologists and Data Scientists familiar with R who wish to more rapidly improve their R language skills. This workshop, spanning two hours with an additional optional hour for lab and discussion, will delve a step deeper into R’s essential features and functions beyond the introductory level. The session will also address debugging strategies, basic best practices, and valuable tips.
Please refer to the syllabus for further information. |
Hybrid, CLSB 11081
Evaluation
Video |
January 17 |
10-12pm |
Nikos George, Introduction to Unix and DS-Computing
In the first part, we will cover basic Unix commands, how to work with directories and files, how to navigate the Unix filesystem and how to review and change file permissions. In the second part, we will give an overview of our computing systems, and how to work with our High Performance Cluster using slurm.
|
Hybrid, CLSB 11081
Evaluation |
January 16 |
2:30-4pm |
Anne O’Neill, INFORM Clinical Trials Database (Introduction to Access and Authorization)
INFORM houses data from DFHCC clinical trials which utilize electronic data capture (eDC). A general overview of the INFORM Clinical Trials Database itself from statistical perspective, the set-up needed to access data, and how to obtain authorization to access data itself from the INFORM database itself is reviewed. Documentation for the available programs to interact with and pull data from INFORM database is also provided. This training is relevant to you if are assigned to a clinical trial whose corresponding clinical data are in INFORM and you have not yet verified your access or authorization. Please reach out to Anne ONeill if you have questions about the training or if it would be applicable to the work that you do.
|
Hybrid, CLSB 11065
Evaluation |