BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Dana-Farber Cancer Institute - ECPv6.15.20//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Dana-Farber Cancer Institute
X-ORIGINAL-URL:https://ds.dfci.harvard.edu
X-WR-CALDESC:Events for Dana-Farber Cancer Institute
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20210314T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20211107T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20220313T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20221106T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20230312T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20231105T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20220426T130000
DTEND;TZID=America/New_York:20220426T140000
DTSTAMP:20260421T220024
CREATED:20220105T135143Z
LAST-MODIFIED:20220429T134618Z
UID:3248-1650978000-1650981600@ds.dfci.harvard.edu
SUMMARY:Frontiers in Biostatistics: Tree-based Ensembling Strategies for Handling Heterogeneous Data
DESCRIPTION:Maya Ramchandran\nData Scientist\, ZephyrAI \nAbstract: Adapting machine learning algorithms to better handle clustering or other partition structure within training data sets is important across a wide variety of biological applications. We first consider the task of learning prediction models when multiple training studies are available. We present a novel weighting approach  for constructing tree-based ensemble learners in this setting\, showing that incorporating multiple layers of ensembling in the training process by weighting trees increases the robustness of the resulting predictor and achieves superior performance to Random Forest. Next\, we broaden the scope of the problem to consider the effect of ensembling forest-based learners trained on clusters within a single data set with heterogeneity in the distribution of the features. We show that constructing ensembles of forests trained on estimated clusters determined by algorithms such as k-means results in significant improvements in accuracy and generalizability over the traditional Random Forest algorithm. We denote our novel approach as the Cross-Cluster Weighted Forest\, and display its robustness and accuracy across simulations and on cancer molecular profiling and gene expression data sets that are naturally divisible into clusters. Finally\, we provide theoretical support to these empirical observations by asymptotically analyzing linear least squares and random forest regressions under a linear model. In particular\, for random forest regression under fixed dimensional linear models\, our bounds imply a strict benefit of our ensembling strategy over classic Random Forest. \nYouTube Video. \nMaya Ramchandran recently completed her PhD at the Harvard Biostatistics department under the supervision of Dr. Giovanni Parmigiani\, where she developed machine learning ensembling strategies with applications to cancer prediction problems. She holds a BS in Applied Mathematics-Biology from Brown University and a Masters of Music in Violin Performance from the New England Conservatory. She currently works as a data scientist at ZephyrAI\, a biotechnology startup that develops novel drug combination and repurposing treatments for oncology.
URL:https://ds.dfci.harvard.edu/event/frontiers-in-biostatistics-tree-based-ensembling-strategies-for-handling-heterogeneous-data/
CATEGORIES:Seminar
ATTACH;FMTTYPE=image/jpeg:https://ds.dfci.harvard.edu/wp-content/uploads/2022/01/1595653826867.jpeg
END:VEVENT
END:VCALENDAR