Whether you’re processing ten data files or ten thousand, making your workflows run faster and cost less is always a goal. The Terra Workflow (aka “Batch”) team has been working on some cost and performance improvements. These aren’t available quite yet, but we wanted to give you a preview of two of the improvements that are in progress:
Less wait time to load Broad’s public genome references
One performance improvement that’s almost ready will speed things up if your workflow uses one of the Broad Institute’s public genome reference data. Until now, having to copy gigabytes of references to the virtual machine meant waiting a few minutes before a task started. To reduce the wait time, we’re adding a reference disk that is automatically mounted when using one or more of the Broad references (gs://gcp-public-data–broad-references/) in a workflow. Since most references are quite large, it’s usually much faster to reference the file via an attached disk than to copy it onto the drive. Best of all, you don’t need to change anything to take advantage of this improvement.
Run workflows optimized with Google Pipelines API V1 on Terra
Using pipelines developed or optimized with Google Pipelines API V1? You’ll soon be able to use these pipelines in Terra. With Google Pipelines API V1, there were fewer machine options available, and some pipelines were developed and optimized with this in mind. A new option is in the works that will allow you to leverage Google Pipelines API V1 machine optimization within V2 – Terra’s current version.
Along with the aforementioned items, the team is also looking at Docker image caching. We hope you enjoyed this sneak peek of what’s coming next for workflows. Look for more on these improvements in future updates.