Sara Salahi is a Senior Product Manager in the Data Sciences Platform at the Broad Institute, where she is responsible for Terra’s Batch Analysis product direction and strategy. In this guest blog post, Sara shares her plans for providing a wider range of workflow tools and capabilities in Terra.
The ability to run data processing and analysis workflows at scale, conveniently and securely, is a cornerstone of the Terra platform.
In its current state, Terra’s workflow execution capabilities rest on a centrally-managed instance of the Cromwell engine, which supports a domain-specific language called the Workflow Description Language (WDL). To date this system has served a large number of research projects ranging vastly in scale, including the Telomere-to-Telomere Consortium’s variant calling analysis (a few thousand WGS genomes) and the All of Us Research Program’s first genomic data release (nearly 100,000 WGS).
On the strength of these successes and many lessons learned along the way, we have charted a path toward the next generation of Terra’s workflow execution capabilities.
Support for more workflow languages
Over the last few years, we have seen the rise of several workflow languages other than WDL that have become popular in the bioinformatics community and/or have been adopted by large research consortia. In accordance with Terra’s mission, we aim to enable you to use whatever tooling makes the most sense for a given project, so we are actively working on laying the groundwork for Terra to support additional workflow languages such as Nextflow and CWL.
Our goal is to provide a convenient, full-featured user experience that is accessible to a wide range of people (without requiring extensive computing experience) and that feels familiar to people who are already using those languages through other systems. To that end, we plan to integrate established tools that are widely used by the community, selected in consultation with community members including developers, project maintainers and end users.
New app model for integrating external workflow tools
We have been developing a new model for integrating external workflow execution tools into Terra as “apps”, in a way that’s conceptually similar to how apps work on mobile devices.
An early version of this app model already supports Terra’s interactive analysis applications, allowing you to run Jupyter Notebooks, RStudio and Galaxy in Terra. That system allows you to launch a private Cloud Environment where all the relevant software is pre-installed, with the option to install additional packages as needed. In the case of Jupyter and RStudio environments, you also have the option to tweak the compute resources to fit your analysis requirements.
The new workflow app model will enable you to launch a private Cloud Environment running the workflow app of choice, preconfigured to submit workflows to the relevant cloud backend. Managing inputs and outputs, submitting workflows, and monitoring progress will all be done through the tool’s native interface.
A major advantage of the app model compared to the “centralized Cromwell” workflow system is that the private Cloud Environment setup will offer data isolation and data residency options for projects where those are compliance requirements. There will also be more opportunity for tweaking configuration settings and increasing scalability on an individual basis.
First up: Cromwell-as-an-app
The very first workflow app to be integrated through this new model will actually be Cromwell itself. We’ve been working on creating an app version of Cromwell that will initially be deployed on the Azure cloud (adding support for Azure in Terra is part of our partnership with Microsoft).
Next, we will also make the Cromwell app available on Google Cloud, as a complementary option for running WDL workflows that will coexist with the current “centralized Cromwell” system. We expect this will primarily appeal to people and projects who stand to benefit from using a single-tenant system — e.g. if they require data isolation and data residency, or need greater control over scalability and configuration options. We will continue to operate the centralized Cromwell for the foreseeable future.
As part of this work, we are planning to streamline Cromwell’s inner workings, which may involve deprecating some current functionality. For example, a few years ago we started adding CWL support to Cromwell in an effort to support additional languages. As noted above, we now believe it will be more productive for us to have Cromwell focus on supporting WDL as it continues to evolve and to adopt tools that are already widely used by the community to run workflows in those other languages. Accordingly, CWL support in Cromwell will enter a deprecation phase starting May 2022 (see Github repository for updates). This approach will enable us to focus our development efforts on building capabilities that offer the greatest benefits to the research community.
Tell us which workflow languages you want to use in Terra
I’m excited to share these plans with you for the first time, and I would love to get your feedback on all of this. In particular, the team and I are very interested to hear which workflow languages and tools you would like us to prioritize once Cromwell-as-an-app is up and running. Both CWL and Nextflow have already been requested in the Active Feature Requests forum, so feel free to upvote either CWL support or Nextflow support — or add your own feature request if what you’d like to see is not represented there.