Summary: We know that the 4 minutes it takes to create a cloud environment to perform a Jupyter Notebook analysis can feel like a long time. To reduce this time and save you cost, Terra has added support for using standard Google Compute Engine Virtual Machines (GCE VMs) as the underlying compute/runtime.
Terra researchers frequently use Jupyter Notebooks for genomic analyses, but may not often think about the virtual machine (VM) used for the underlying Notebook compute. However, the VM can be a very important factor in not only how fast you can create a Notebook environment but the types of applications you can use in Terra. When Terra was originally created, we chose a Spark VM for the underlying Notebook compute so you could perform GWAS analysis with the Hail python library. But as Terra has expanded, researchers now use Jupyter Notebooks for multiple use cases beyond Hail, and others require new tools like RStudio and Galaxy which don’t run as efficiently on Spark. That’s why we are thrilled to introduce support for GCE VMs, a first step in faster cloud environment creation and the integration of new Terra applications.
There are multiple advantages to using GCE VMs. Compared to the current Spark VM in Terra, GCE VMs are created in 50% of the time due to fewer installation steps, saving you precious time for your analyses. They also reduce the costs for running and paused VMs by $0.01 per CPU. But perhaps the best part of the GCE VMs is that they provide support for detachable persistent disks, a more durable storage solution that will be required to support new applications like RStudio. Persistent disks will also allow you to save your analysis environment set up and your output files even after you delete your VM (you can read more about them in an upcoming blog).
Based on these benefits, we are now providing GCE VMs as a “Standard” option for your Jupyter Notebook’s underlying compute while continuing to support Hail and Spark. When you create a new cloud environment for Jupyter in Terra, you will have the choice of either a “Standard VM” or a “Spark” VM/cluster. There are two ways to make the selection:
- You can choose the appropriate environment from the “Runtime Configuration” window’s dropdown menu. Depending on the environment you choose, Terra will recommend which VM type to select. For example, if you choose the Default environment, Terra will automatically select “Standard VM”. If you select the Hail environment, Terra will automatically select Spark; you can then choose to create either a single “Spark master node”, or a Spark cluster with worker nodes.
- You can choose the VM by simply toggling the “Runtime type” dropdown; select either Standard or Spark VMs.
When you choose the Standard VM option, you will find that cloud environments for Jupyter create in ~2 minutes as opposed to the ~4 minutes required to create a Spark VM/cluster (although sometimes this time fluctuates depending on Google).
We’re currently working on providing support for detachable persistent disks as the next step to improve the analysis experience on Terra.