The freedom of portable workflows

One of the foundational principles of Terra is that it’s designed to be an open ecosystem, not a walled garden — there are no lock-in mechanisms. If after a while you decide to leave, you can take all the analysis tooling you’ve been using here and expect it to still be functional wherever you’re going.

Applied specifically to analysis workflows, that means you can migrate any pipeline you’ve been running in Terra to either an on-premises server/cluster or to any of the major cloud providers such as Amazon Web Services (AWS), Microsoft Azure, and of course Google Cloud outside of Terra. You can also share your workflow with collaborators or publish it with the expectation that anyone should be able to run it wherever they normally work — without being required to come to Terra.

How does that work? 

First, some basics. In Terra, we use the Workflow Description Language, abbreviated to WDL and pronounced “widdle”, and stewarded by the OpenWDL community. Terra includes a built-in Cromwell server that interprets your WDL workflow script and transforms it into batches of individual analysis instructions, which it then dispatches to a Google Cloud service called Pipelines API* for execution. Then the magic happens and you get results.

cromwell-gcp-overview.png

* The Pipelines API was recently rebranded as Life Sciences API by Google Cloud; I’m using the older name here for consistency pending documentation updates.

The beauty of this setup is that you don’t have to worry about configuring Google Cloud resources, like spinning up virtual machines (VMs) and whatnot manually; Cromwell does it all for you through the Pipelines API. You just need to give Cromwell your workflow and the list of inputs you want to run on, which you can do either through the web interface or through an API if you’re into that sort of thing.

If you haven’t run workflows in Terra before, check out this Workflows Quickstart video to see how it’s done.

Importantly, you can also run Cromwell outside of Terra, on any Linux machine with Java installed (yes, that’s a very low bar). So technically you could run your WDL workflows anywhere with just that. But where true* portability kicks in is that Cromwell comes equipped with a set of plugin components called “backends” that enable it to communicate with other platforms just like it does with Google Cloud. That is how you can take a WDL from Terra and run it on your local cluster or on another cloud; it’s a matter of setting up a copy of Cromwell with the appropriate backend configuration. You can find example configurations for various backends in the Cromwell documentation.

* This is a form of workflow portability where the definition of “ability to run” includes the orchestration of computing resources as something that is managed for you, rather than something you have to do yourself. This is quite different from what I’ll call the “Linux-level of portability”, by which it’s enough for the software to be runnable on a basic Linux system. 

The roadmap of workflow portability is an evolving landscape

Now that I’ve painted this rosy picture, I want to acknowledge that there are some challenges to the portability of workflows across platforms. One is that Cromwell’s plugin backends are mostly developed by external contributors who are experts in the mechanics of a particular platform; for example, the AWS Batch backend is entirely developed and maintained by an engineering team at Amazon. There is a similar situation for Cromwell’s Microsoft Azure backend. As Cromwell evolves new features and the cloud platforms themselves evolve, it takes a lot of effort across the board to make all the necessary updates for running workflows to “just work”.

But I’m happy to report that that work is happening. In fact, the AWS team recently announced the release of an updated version of the AWS Batch backend that now supports more Cromwell features compared to their earlier version. We’re thrilled to see this development since it improves the level of portability of WDL workflows on AWS, an important platform in the life sciences.

It’s also worth noting that WDL is not the only workflow language to enjoy such portability; a close analog is the Common Workflow Language, or CWL, which Cromwell also supports, although Terra itself currently does not. Another popular workflow system called Nextflow also supports connecting to Google Cloud and AWS Batch through the same services that Cromwell connects to.

Conversely, Cromwell is not the only workflow management system capable of running WDL workflows. The DNAnexus platform has its own WDL interpreter, which converts the workflow to the platform’s native workflow language. There is also the excellent miniWDL runner developed by Mike Lin, which is designed less for scale than for end-user usability; it is generally considered more accessible for newcomers and more flexible for development purposes.

As you can see, this is an active and growing ecosystem; I plan to write more about this in the near future.

Finally, you can see a form of workflow portability in action by trying out the export functionality in the Dockstore tool repository, which allows you to export a workflow from the repository to several platforms that are participating in ongoing interoperability development efforts. For example, go to this workflow page and select one of the platforms. You can find a screencast video walking through this process from Dockstore to Terra here, which we recorded for our upcoming workshop at the ASHG 2020 meeting of the American Society for Human Genetics.

I hope that this has shed some light on how Terra’s workflow system is designed to increase the portability of your analysis workflows, and by extension, your freedom to move and collaborate across computing platforms.

As always, let us know what you think, and don’t hesitate to reach out to support@terra.bio if you have any questions.

Share

Share on facebook
Share on linkedin
Share on twitter