As we have discussed previously, our collaborators in the Sabeti Lab at the Broad Institute have been analyzing SARS-CoV-2 viral genomes from COVID-19 cases in the Boston area, in partnership with the MA Department of Public Health and Massachusetts General Hospital. From the beginning of this work, they have shared their methods and data with the research community in a public workspace in Terra.
Today, the team led by Bronwyn MacInnis shared a manuscript preprint in medRxiv, detailing a comprehensive investigation of SARS-CoV-2 emergence and spread in the Boston area, including a superspreading event at an international professional conference and in a skilled nursing facility, and outbreaks in homeless communities. Based on sequencing and phylogenetic analysis of 772 genomes across the first wave of the pandemic, it is also remarkable for being one of the largest and most detailed genomic epidemiological surveys of SARS-CoV-2 to date.
The public workspace contains data from every viral sample that was sequenced in this effort, all of which were processed and analyzed in Terra. The analytical workflows are included in the workspace and preconfigured with the analysis parameters used in the paper, so anyone can clone the workspace and reproduce the analysis as it is described in the preprint. Note that for tree building, which involves GISAID data (subject to restrictions outside of our control) the team substituted public data from GenBank to demonstrate the methodology. The data is also available in NCBI GenBank and visualized on NextStrain.org, as well as the Broad’s own visualization portal.
You can read more about the details and context of this work in our earlier blog post, the workspace dashboard, and of course the preprint itself. For a layperson’s explanation of the findings reported in the paper, please see the Broad Institute’s blog.