Jonathan Lawson is a Senior Software Product Manager in the Broad Institute Data Sciences Platform, overseeing data management products including the Terra Data Repository and the Data Use Oversight System. In this guest blog post, Jonathan announces the public preview phase of the Terra Data Repository, a new component of the Terra platform designed to provide data storage and access management capabilities tailored for the life sciences.
Life sciences research has entered an age of extraordinary opportunity thanks to the rapid technological developments of the past decade. We are now able to generate vast amounts of molecular information, such as genomic sequencing, and we can put that molecular data in the context of phenotypes and clinical history to probe the biology of both health and disease in unprecedented detail. These capabilities are already starting to revolutionize how we approach everything from fundamental research into population genetics to diagnostics and drug development.
Yet these technological prowesses also bring forth new technical challenges. The resulting datasets are complex, combining enormous files of molecular data with structured information —such as phenotypic data— that is best stored in database form. In addition, data assets collected from human participants are subject to various constraints with regard to how they can be shared, and with whom.
Solving this challenge calls for data storage and sharing solutions that empower data owners and custodians to make their datasets available for analysis to the research community securely, responsibly and effectively.
Today, we are excited to introduce the Terra Data Repository (TDR), a new component of the Terra platform designed to provide data storage and access management capabilities tailored for the life sciences. It is already actively being used for large collaborative projects including the Human Cell Atlas and the NHGRI’s AnVIL.
The system supports using formal schemas to represent relationships between different data entities, and generating versioned snapshots that can be used to grant collaborators access to specific subsets of data depending on research purpose and authorizations. Data snapshots are immutable, making it possible to release continuous updates to datasets while ensuring reproducibility of analyses over time.
For a complete overview of features, usage instructions and detailed technical information, please visit the TDR documentation in the Terra knowledge base.
The Terra Data Repository is available as a public preview to all registered users of Terra. Please note that the graphical user interface is still under active development, and many operations can currently only be performed through API calls. During this time, we recommend reaching out to the Terra support team to discuss whether the Terra Data Repository might be a good fit for your project’s needs.