Terra notebooks

Update your Terra Notebooks Utilities for continued access to data via DRS/DOS

TL;DR: If you have been using the Terra Notebook Utilities (TNU) to access data through DRS/DOS URIs, you need to update your version of TNU before December 1, 2020 as described further below. 

Some of the many data repositories that are accessible through Terra use systems called Data Repository Service (DRS, pronounced “duhrs”) and Data Object Service (DOS) to manage file locations in a way that allows you to get access to data without having to know exactly where it is stored. In other words, you can run an analysis on the data without actually knowing the exact path to where it lives. Without going into the details of how this sorcery works, the basic idea is that you give the system a unique identifier, and it gives whatever tool you’re using access to the data.

You can learn more about the DRS/DOS system and how to use it in Terra in this documentation article

If you’ve already been using a dataset that’s accessed through DRS/DOS, you’ve probably had to use identifiers to work with some or all of the data.

For workflows this is pretty transparent; you just point to the identifiers listed in the data table as your inputs, and the Cromwell workflow manager will work with Terra’s DRS/DOS processing system, which is called Martha, to get the files localized at runtime. All the relevant components are managed for you behind the scenes so you don’t need to do anything to stay up to date.

For notebooks there’s an extra layer involved; you have to use a Python package called Terra Notebook Utilities (TNU) to connect to Martha and access the files. TNU is a package you install yourself in your notebook environment, so you may occasionally need to update your version of the package to keep up with system updates.

Which is where today’s ACTION ITEM comes in. Our engineering team has recently updated the Martha service to provide new functionality, including better error messages! This is valuable progress, but it involves some functional changes that require updating the Terra Notebook Utilities to use the new version. Importantly, the old version will stop working on December 1, 2020, so you must update your installed version of the TNU package (to version 0.5.0 or later) if you want to continue accessing DRS/DOS-mediated datasets. 

The good news is that the update process is fairly straightforward; you just need to use the command corresponding to the environment you’re working in:

From any Jupyter notebook in Terra: (be sure to include the leading “%”)

%pip install --upgrade --no-cache-dir terra-notebook-utils

From the CLI on standard Terra-provided Notebook environments:

/usr/local/bin/pip install --upgrade --no-cache-dir terra-notebook-utils

Note that all standard Notebook environments on Terra are based on this Docker image.

For other environments: it should be enough to do the following:

pip install --upgrade --no-cache-dir terra-notebook-utils

If you run into any trouble, please reach out to the Terra support team through the Helpdesk form or the community forum. For more information on how to use the Terra Notebook Utilities to access data through DRS/DOS, see this documentation article.

Share

Share on facebook
Share on linkedin
Share on twitter