Late last week, the Broad Institute announced a new partnership with the Centers for Disease Control and Prevention (CDC) and Theiagen Genomics that will help public health laboratories in the USA adopt Terra for pathogen genomics and biosurveillance. The partnership is primarily focused on the SARS-CoV-2 virus at the moment, but its long-term scope includes any other infectious disease agents where a genomics approach is appropriate. The Broad Institute and Theiagen Genomics have already been supporting thirty public health labs using Terra in several states, so we’re excited to see this new partnership with the CDC expand support to all 50 states.
Scaling up viral genomics for public health
The current pandemic is showing us how impactful genomic analysis can be for monitoring viral outbreaks. This technology gives epidemiologists the ability to study new variants with far greater resolution than what was possible in the past; they can monitor patterns of spread to evaluate the rates of transmission of different variants, track the relative severity of symptoms in infected individuals, and even assess differences in the degree of protection offered by vaccines. These lines of information are helpful for public health officials at all levels, from local to national, who must continually update their policy response to protect their population in light of the latest developments.
The good news is that this technology has become widely available and affordable in the USA, so it has the potential to be a powerful tool for the public health laboratories that are distributed across the country, providing services to their local communities and reporting back to the CDC. But there’s a sticking point: this is the first opportunity that most public health labs have had to start using genomic analysis at scale. There is a wide variability in the type and scale of the computing infrastructure they have access to, and their staff members typically don’t have the advanced bioinformatics training needed to develop their own protocols for genomic data processing and analysis.
The new partnership announced by the Broad Institute seeks to address this problem by providing both a technology solution — the use of Terra as a scalable platform for managing and analyzing data — and additional services, delivered by Theiagen Genomics in collaboration with the Terra support team, to ensure that public health lab staff have everything they need to work effectively on the platform.
Supporting public health labs in practice
We’ve written previously about resources made available by the Broad Institute’s Viral Genomics group, such as the public workspace that contains analysis workflows and example data recapitulating their 2020 paper on the outbreak of SARS-CoV-2 infection that took place early last year in New England. You might wonder, can’t public health labs simply use those workflows out of the box?
The reality is that there is a fairly large gap between the workflows used in a research paper and the needs of public health labs: they need workflows that are engineered to be straightforward and very robust, for use by non-specialists, but also have the ability to process data generated with different types of technologies, like Oxford Nanopore long read sequencing, that were not used in the original study. They also typically need a variety of utility workflows to integrate with their specific ways of working, preferred formats and so on.
To address this gap, members of the Theiagen team have been developing standardized bioinformatics workflows that automate all the data processing and analysis steps involved: assembling genomes from the raw sequencing data, building phylogenetic trees, even submitting the sequences to the NCBI’s public data repository. Importantly, they publish these workflows in open-source Dockstore collections that anyone can use on Terra or their preferred platform, and they maintain these workflows and develop new ones as the technology evolves.
As a complement to the workflow development work, the Theiagen team has also been helping public health labs get set up to use Terra. For most labs, this is their organization’s first venture into the cloud, so having someone liaise with their IT departments and help them navigate the initial account setup can speed up the process significantly.
And finally, they’ve been training the laboratory staff to manage and analyze their data with the Theiagen workflows. This training includes live sessions as well as pre-recorded videos such as this one, which demonstrates how to use their SARS-CoV2 analysis workflows on Terra.
Based on feedback we’ve heard so far from public health labs involved in the pilot phase, this model has been really effective at empowering them to manage complex data processing and analysis tasks, and doing so at a scale that would have been extremely challenging otherwise.
This is hugely gratifying for all the teams involved, and we all take pride in the thought that our work is helping deliver critical information in the fight against COVID-19. We’re also very excited that so many of the resources created to support the US public health labs are publicly accessible and can be used by any lab in the world, not just the USA, to tackle this challenge effectively.