The Path of Genomes: Expediting the way to actionable public health data

Frank Ambrosio is a bioinformatics scientist at Theiagen Genomics, a company whose mission is to transform public health and infectious disease surveillance through the innovative implementation of NGS and bioinformatics technologies. In this guest blog post, Frank recounts how Terra has come to serve as a shared platform for public health labs, and to foster cross-cutting collaboration among members of the public health community.


Every public health scientist remembers the sequence of events and conversations that occurred leading up to their realization that we were facing a legitimate pandemic threat to our species in the form of a novel viral pathogen. We all remember the first “two weeks to flatten the curve”, and then the realization that this would not be nearly enough to stop the disease from reaching pandemic proportions. Filled with trepidation, we all watched the spread of the disease even as we helped produce the data and interpretations that provided situational awareness to our public health and government leaders. 

Eventually we began to settle into the “new normal” of social distancing and quarantines, but with the fabric of society starting to fray from the effects of prolonged fear of exposure and social isolation, the world looked to the scientific community for answers. Our public health community was thus thrust into the limelight while facing its greatest challenge since 1918: COVID-19 would quickly become the worst pandemic to plague humanity since it became possible to sequence the genomic material of pathogens.  

The way forward was clear: we had to accelerate sequencing efforts to monitor viral variants, but the magnitude of the undertaking was profound. Simply generating all that sequencing data would be a monumental task involving purchasing new equipment, training personnel, and validating new assays, all while adapting our extant pathogen surveillance systems to the new disease.

None of this was easy, and it could not happen overnight, but our community rose to the occasion. Sequencing efforts expanded at unprecedented rates, contact tracing teams worked around the clock, and –crucially– we developed a new collaborative model for developing and sharing public health bioinformatics resources.


Opportunity and challenges of the genomic era

In the context of a modern globalized society, infectious disease pandemics, by their very nature, require a collaborative effort from the global public health community to mitigate breadth and severity of their impact. 

For the first time in history, we had the tools to sequence and analyze pathogen genomes from the onset of the pandemic, giving us the opportunity to track the evolution of a virus as it propagates throughout the world. Yet we lacked standardized approaches for processing the data, extracting actionable insights and sharing them across public health jurisdictions, both nationally and internationally. Individual laboratories were initially using their own custom pipelines to assemble SARS-CoV-2 genomes. They experienced challenges even discussing their outputs with other labs, let alone performing the aggregation required for longitudinal analyses. Additionally, labs that did not have a bioinformatics expert to develop and run these pipelines were unable to analyze the sequence data they worked so hard to generate.

It quickly became clear that the public health community would benefit from a focused effort to enhance our ability to collaborate on the development and distribution of analytical pipelines. Ideally, we would be able to distribute these pipelines through a medium that made them accessible to scientists of diverse technical backgrounds, and provide a forum for discussing the nuances of these pipelines and providing feedback to the developers.

For much of the public health community in the USA, that medium was the Terra platform. 


Terra as a public health conduit

When our team at Theiagen Genomics was given the opportunity to help public health laboratories across the country tackle their newfound wealth of viral genomics data, we started using Terra as the common platform that all our partner labs could use to host data, perform genomic analyses and share results. 

We worked with our public health partners to develop standardized workflows that automate all the data processing and analysis steps involved, such as assembling genomes from the raw sequencing data, building phylogenetic trees, and submitting the sequences to the NCBI’s public data repository.


theiagen scope
Screenshot from the Genomic Analysis of SC2 introductory video by Theiagen Genomics


We leveraged the use of a shared analysis platform to help bring together different public health partners —local, state, and federal— and create a community of practice for distributable workflows and harmonized results. On a regular basis, this community of practice compares outputs, discusses interpretation of results, and contributes to the codebase. These discussions are facilitated by the Terra Training and Office Hours sessions hosted by Theiagen, which provide a unique opportunity for public health scientists to discuss the analytical workflows they use on a daily basis with the developers and fellow end-users. is the nexus of all SARS-CoV-2 sequencing data for this community, and these office hours sessions serve as the nexus of ideas. In these sessions, updates to the platform and workflows are announced, there are demonstrations on how to overcome common issues, and public health scientists from around the country discuss the latest situational developments. The sessions are recorded, but off-record time is saved at the end of the call to allow laboratories to freely discuss laboratory-specific issues like the detection of a new variant, a particular outbreak investigation, or developing a new approach. 

As we’ve grown from working with just two public health laboratories to over 40 (in part through a partnership with the Broad Institute and the CDC), this community engagement model has scaled very well, with office hours meetings now approaching 100 participants. And as participation in these events continues to rise, so too does the value of the discussions. 


Success through practitioner-led research and development

From a technical standpoint, this practice of involving the public health community closely in the development of our analytical workflows has allowed us to avoid the pitfalls that are typically associated with siloed development, and to identify and resolve interoperability issues before they have the opportunity to cause fissures in the community. Through consistent communication with participating labs, we were able to develop analytical procedures, sets of parameter values and reporting standards that are both scientifically sound and well understood among the community members. 

As a result, we’ve seen wide adoption of the workflows by public health labs. That in turn has led to a considerable rise in the number of researchers who are able to analyze their own sequencing data, and perhaps most importantly, the amount of data made available in a usable, understandable form to public health policy makers has skyrocketed.

phl community


A model for public health bioinformatics beyond the current crisis

Our story is ultimately one of successfully combining a cutting-edge technological solution with an old-school community building approach. Being able to bring molecular epidemiology experts together into a shared data platform and providing a venue for collaborative engagement with our bioinformatics scientists has changed the game and led to substantive benefits for public health at large.

The value of the community of practice that has emerged from this initiative stretches far beyond simply developing better workflows and educating end-users. By engaging with this passionate community of scientists who deeply understand the complexities of pandemic genomics, we can ensure that the information reported to policymakers is clear and, most importantly, actionable. 

Moreover, the impacts that this community has on the tools and procedures developed to combat SARS-CoV-2 will reverberate through public health for years to come. Even as we are dealing with renewed levels of COVID-19, new challenges are rising on the horizon. We are hopeful that our experience will contribute to establishing a modern model of outbreak genomics, and drive the field of global public health in the direction of openness, collaboration, and community.