This blog is part of our Paper Spotlight series, which features peer-reviewed research publications involving work done in Terra and highlights how the analysis methods were applied.
Transmission from vaccinated individuals in a large SARS-CoV-2 Delta variant outbreak
By Katherine J. Siddle, Lydia A. Krasilnikova, Gage K. Moreno, Stephen F. Schaffner et al., 2022
Cell, Volume 185, Issue 3, 485 – 492.e10 https://doi.org/10.1016/j.cell.2021.12.027
Abstract: An outbreak of over 1,000 COVID-19 cases in Provincetown, Massachusetts (MA), in July 2021—the first large outbreak mostly in vaccinated individuals in the US—prompted a comprehensive public health response, motivating changes to national masking recommendations and raising questions about infection and transmission among vaccinated individuals. To address these questions, we combined viral genomic and epidemiological data from 467 individuals, including 40% of outbreak-associated cases. The Delta variant accounted for 99% of cases in this dataset; it was introduced from at least 40 sources, but 83% of cases derived from a single source, likely through transmission across multiple settings over a short time rather than a single event. Genomic and epidemiological data supported multiple transmissions of Delta from and between fully vaccinated individuals. However, despite its magnitude, the outbreak had limited onward impact in MA and the US overall, likely due to high vaccination rates and a robust public health response.
What part of the work was done in Terra?
Excerpts from the paper’s Methods section:
SARS-CoV-2 genome assembly and analysis
For genomes generated at the Broad Institute, we conducted all analyses using viral-ngs 2.1.28 on the Terra platform (app.terra.bio). All of the workflows named below are publicly available via the Dockstore Tool Registry Service (dockstore.org/organizations/BroadInstitute/collections/pgs). Briefly, samples were demultiplexed, reads were filtered for known sequencing contaminants, and SARS-CoV-2 reads were assembled using a reference-based assembly approach with the SARS-CoV-2 isolate Wuhan-Hu-1 reference genome GenBank: NC_045512.2 (sarscov2_illumina_full.wdl).
Phylogenetic Tree construction
We constructed a maximum-likelihood (ML) phylogenetic tree (Sagulenko et al., 2018) with associated visualizations using a SARS-CoV-2-tailored Augur pipeline (Huddleston et al., 2021) (sarscov2_nextstrain_aligned_input), part of the Nextstrain project (Hadfield et al., 2018), adapted from github.com/nextstrain/ncov, with the entirety of ARTICv3 amplicons 64, 72, and 73 (Delta dropout regions) masked from tree construction.
How did they do it?
The authors implemented these analyses as WDL workflows, which they deposited in Dockstore then imported into Terra. They ran the workflows at scale using Terra’s workflow execution service. You can see these workflows in action in the public COVID-19 workspace maintained by the authors as a related resource.
To try your hand at running a workflow in Terra, check out this Quickstart Tutorial Workspace.
Appendix: Data and code availability
- All SARS-CoV-2 genomes, patient metadata, and raw sequencing reads have been deposited to NCBI under BioProject: PRJNA715749 or BioProject: PRJNA686883 in GenBank, BioSample, and SRA databases, respectively. All genomes produced in the present study are also available on GISAID. All data is publicly available as of the date of publication. Accession numbers of additional publicly available data analyzed in this paper are available in Table S1.
- All code used for sequence data processing, genome assembly, and phylogenetic analysis is publicly available either via the Dockstore Tool Registry Service (dockstore.org/organizations/BroadInstitute/collections/pgs) or on GitHub (github.com/AndrewLangvt/genomic_analyses/blob/main/workflows/wf_viral_refbased_assembly.wdl).
- Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request (firstname.lastname@example.org).