Like you, we are adapting to a different way of living and working as the 2019 novel coronavirus (COVID-19) spreads, claiming many lives, sickening many more, and upending daily life around the world. We are heartened to see that the scientific community is mobilizing (in labs and remotely) to find and exploit any potential avenue for tracking, slowing, or stopping this virus.
As our scientific collaborators at the Broad and around the world knuckle down to analyze the data that is starting to stream in, the Terra team is prioritizing work to support their efforts. In collaboration with Dr. Danny Park, Group Leader for Viral Computational Genomics at the Broad Institute of MIT and Harvard, and his colleagues, we are excited to release a first set of resources for COVID-19 analysis in Terra.
Best practices for analyzing COVID-19 genomic data
The workspace contains best-practices workflows for processing and analyzing viral genomic data that Dr. Park has been developing and teaching to public health lab scientists for the past six years. Dr. Park and colleagues will be using these workflows in Terra for processing COVID-19 research data generated internally at the Broad. These same workflows are now available in Terra, making it possible for anyone to analyze the publicly available data as well as any data they are generating themselves.
More specifically, the COVID-19 workspace contains:
- Raw COVID-19 sequencing data (.fastq and .BAM) available from the NCBI Sequence Read Archive (SRA), which will be regularly updated as more data becomes available
- Workflows for genome assembly, quality control, metagenomic classification, and aggregate statistics
- A Jupyter Notebook that produces quality control plots from the data output by the workflows
We expect that this workspace will be most useful to scientists in public health labs and departments of health who need robust, best-practices workflows for analyzing COVID-19 genomics data from their jurisdiction. We do of course encourage anyone interested to check out the workspace, run their own analyses with these tools and suggest improvements.
The COVID-19 workspace is a growing and evolving resource
Today is the release of the first version of this COVID-19 workspace, but this is only the beginning. As more data and tools become available, we will continue to develop the workspace and expand its usefulness to the community. Here are some of our priorities for the immediate future:
- Add COVID-19 sequences generated at the Broad Institute and from the SRA as they become available to enable more robust and comprehensive analysis
- Incorporate additional tools for phylogenetic analysis that will make use of a growing library of COVID-19 sequences
- Include modules that will help you prepare your assembled sequences for submission to sequence repositories like GenBank, SRA, and GISAID, and easier integration with community dashboards like Nextstrain
- Solicit and incorporate feedback, contributions, tools, and data from the community
Update: Read about the next version of this COVID-19 workspace here.
Additional help for learning how to use COVID-19 resources in Terra
If you are a researcher or public health lab scientist who is coming to Terra for the first time, we recommend you start by reading the COVID-19 article in the Terra support center, which will orient you to relevant workspaces and tools, as well resources in the support center for learning the basics of Terra. We will continue to update that article as we develop new learning materials around this workspace and any others as they are updated or released.