The following testimonials have been freely provided by researchers working in Terra or its earlier incarnation, FireCloud. Read on to see how specific features of the platform enabled their work.
Teaching and code sharing in a common environment made easy
Amelia Weber Hall, PhD
Postdoctoral Fellow, Computational Biology
Cardiovascular Research Center, Massachusetts General Hospital
Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard
My colleagues and I have been using Terra for our single-cell RNAseq analysis work. Some of our workflows are sensitive to which Python package is used, so for troubleshooting it’s very helpful that Terra provides a consistent shareable environment. Now we know exactly which base image we are all working from, and only the Python environment needs to be more actively managed. I really appreciate how this ensures computational reproducibility for collaboration and code sharing, while still providing plenty of flexibility for customization.
These benefits also apply to teaching and training. I informally teach wet lab biologists in my group how to do basic analysis with command line tools and intro-level coding in R and Python. Initially, I struggled to get us all onto the same computing environment, because some of us are at the Broad Institute, and some at Massachusetts General Hospital, which have different on-premises computing systems that don’t support all packages equally. Terra effectively solved that problem for me, since now I can simply share with the group a workspace containing a notebook and any necessary data. They can each clone the workspace or copy the notebook over to their own workspace, spin up the same predefined environment and we can focus on understanding the code and how it works.
Enabling secure collaborations within the Broad Institute and with national and international partners
Tim Majarian
Computational Associate
Manning Lab
Broad Institute of MIT and Harvard
Over the past three years, Terra (and formerly FireCloud) has become an integral part of nearly all of our lab’s work, from genome-wide association analyses within the TOPMed consortium to biobank-scale longitudinal studies of diabetes progression. Terra has enabled collaborations both within the Broad Institute and with our national and international partners while ensuring data security and access to on-demand compute resources. The platform is intuitive, particularly to those who may be unfamiliar with cloud computing and reproducible workflow development; multiple trainees and post-docs within our lab have been onboarded to the platform and subsequently developed their own tools and workflows within a short period of time, quickly facilitating large-scale analyses of complex, sensitive data, like metabolomics and whole genome sequence data.
Terra empowers our lab to conduct impactful biomedical research and allows us to showcase our methods and workflows as a model for the broader scientific community. Through the platform, we have been able to present our work in analyzing whole genome sequence data to a variety of audiences, and provide interactive tutorials on statistical genetics within a cloud computing framework.
Managing and analyzing data at scale on the cloud
James Gatter
Computational Technician
Shalek Lab
IMES, Massachusetts Institute of Technology
With the ease and accessibility it affords, Terra has enabled our lab to leverage the power of the cloud to process, organize, and analyze our data on a larger scale.
Easy access to the Broad’s latest Best Practices Pipelines
Matt Bashton, PhD
Vice Chancellors Senior Fellow
Faculty of Health and Life Sciences, NorthumbrIa University
Newcastle, UK
Terra is a key resource for us, as it gives us immediate access to the latest pipelines from the Broad in a form that is maintained and updated by the experts behind the methods in question, and so always reflects the current Best Practices. This takes away a lot of overhead that we would normally incur in keeping pipelines updated and implemented locally. We’re also impressed by the pipelines’ highly optimized use of Google Cloud infrastructure, which keeps running costs in the cloud to a surprising minimum.
The best of both worlds: testing locally, running on the Cloud
Jessica Hekman, DVM, PhD
Postdoctoral scientist
Karlsson Lab
Broad Institute of MIT and Harvard
Our lab is moving to Terra for a project that involves calling variants in large datasets (thousands of whole genomes). It’s going to require a lot of parallel compute for a short amount of time, so it makes a lot of sense to do it on the cloud. However, we wanted to still do the pipeline development work on our local infrastructure since it’s free and allows us to iterate very quickly. Fortunately Cromwell, the workflow management system built into Terra, can also be used locally as a standalone. So we can write our pipelines in WDL, use Cromwell on premises to test the pipeline on small numbers of samples, and then move the result onto Terra for running at large scale. The only change we need to make is to adjust the inputs to point to where the data lives in cloud storage. This allows us to develop without having to worry about paying for compute when we make the inevitable development mistakes, and move onto Terra only when we are confident that our pipeline is robust. The combination of Cromwell and Terra lets us take advantage of the best of both worlds: fast and free development on-premises and large-scale execution on cloud. On top of that, we can take advantage of the WDL pipelines available from other labs, and when we publish our results we’ll be able to share our methods with others very easily.
Recording and running reproducible analyses
Biomedical Software Engineer
Icahn School of Medicine at Mount Sinai
Terra is a great platform for recording and running reproducible analyses. I often run into a situation where I’ve read a paper and I can’t figure out where the data is. If I somehow find the data, the preprocessing and filtering methods applied to the data are opaque. Let’s say I have miraculously scraped together the data and all of the code applied to this data set: will I be able to run this code on my machine? There is a very slim chance I have the right versions of software and all of its expected dependencies. What a headache! So I think it’s really important that we all take responsibility for sharing our work in a way that supports computational reproducibility. I like that Terra lets me maintain my data in one centralized location, record my code and workflows with Jupyter notebooks and WDL, and resolve software compatibility issues with Docker. With these tools, I can easily run an analysis from start to finish, over and over again, and I can enable others to do the same without any additional effort.
Bo Li, PhD
Director, Bioinformatics and Computational Biology Program, CIID, MGH
Member of the Faculty, Harvard Medical School
Associate Scientist, Klarman Cell Observatory, Broad Institute of MIT and Harvard
Moving to the Terra platform has been transformative for my team’s research. Terra provides us with a convenient and flexible way to run analysis workflows on the cloud and helps us ensure the computational reproducibility of our analyses. I recommend Terra as a great cloud-native option for anyone who needs to run bioinformatics workflows, especially on large datasets.
Answering the challenges of big data analysis and collaborations
Alisa Manning, PhD
Assistant Investigator
Clinical and Translational Epidemiology Unit, Mongan Institute
Massachusetts General Hospital
I have been involved in efforts to link rare non-coding variants to complex diseases since 2011. In the early days, we performed our analysis with whole genome sequence data on our local cluster. The Precision Medicine Initiative made large-scale WGS in epidemiological cohorts available and it became clear that retrieving, storing and analyzing these massive data files would be problematic. Another challenge was the collaborative model that our funding agencies were encouraging — we have collaborators at many other institutions with whom we wanted to share resources, code and analysis results.
We found that FireCloud provided an optimized solution to our challenges. We had to learn a new iterative style of workflow development (involving Cromwell, WDL, and docker in addition to FireCloud itself), but as soon as we had our development cycle in place, we were able to develop and deploy our analysis workflows with the engagement and assistance from our collaborators. The platform provides a model for our work to be open-source, with excellent tools to manage user access and cloud computing costs. The development team has been extremely responsive to the needs of the research community with a series of enhancements that have enabled us to perform more sophisticated analyses. I’m excited to start taking advantage of the further improvements in Terra.
Kathleen Morrill
Ph.D. Candidate
Dog Aging Project Team Member
University of Massachusetts Medical School
I love the philosophy behind Terra most of all: unifying the way we all work together on a highly collaborative project and handling shared data in a very clean way. It’s super sleek, well-implemented, and well-supported.
Managing and sharing analysis workflows
Ryan Collins
Regarding the gnomAD-SV project: An open resource of structural variation for medical and population genetics. Ryan Collins et al., 2019
I couldn’t have done this project without FireCloud. Two FireCloud features were particularly important. First is the abstraction of methods across projects and the ability to share them. Second is all of the workflow management features. We have thirty-six workflows so being able to see the status of each workflow, the cost, the time it is taking to execute, and having our metadata and outputs organized in the data model was very useful.
Effective collaborating and working towards fully reproducible science
Claire Margolis
FireCloud has enabled our group to collaborate more effectively and work towards fully reproducible science. Especially in academia, an environment in which people frequently come in and out at various stages of training and tend not to stay in one place for too long, the capacity for the FireCloud infrastructure to act as a knowledge repository has been incredibly useful in longer-running projects as well as for onboarding new members. Access to an open methods repository where WDLs and method evolution can be viewed transparently and methods can be easily exported to individual workspaces lowers the barrier to entry for conducting analyses in the cloud, allows us to leverage the broader Broad community, and reduces duplication of effort.
One of my first projects in FireCloud was to re-implement a couple of methods I’d built locally, with the focus of making them more user-friendly so that others in our group and beyond could use them. After getting over the initial hurdle associated with learning how to work with dockers and use WDL, I was able to get them up and running relatively quickly. Now, the methods are well-documented and publicly available for others to clone, and I’ve additionally created a workspace containing successful runs of both methods on example data so that members of our lab can use it as a template for their own analyses moving forward. Thanks for creating such a useful platform!
Democratizing computational bio, enabling reproducible research
Matthieu J. Miossec, PhD
Bioinformatics scientist
Centre for bioinformatics and integrative biology
Universidad Andrés Bello
I was first introduced to FireCloud in 2018 when my workshop proposal for a conference was merged with one from the Broad Institute’s GATK team. For the purposes of the workshop, we reproduced one of the studies I worked on as a doctoral student and research associate (see Page et al, 2018), which was an opportunity for me to discover the platform in detail. The cloud-computing aspect of FireCloud, in itself tremendously useful, particularly for laboratories that don’t always have direct access to high-performance computer clusters, is only the tip of the iceberg.
FireCloud’s method repository was tremendously helpful in building up the pipeline in a brief amount of time. The repository contains both well-documented featured methods created by the GATK team and public methods contributed by other FireCloud users. In both these repositories we found methods that corresponded closely to what my previous team had implemented on local machines and we were therefore able to clone these instead of starting over from scratch. Crucially, once cloned we could make all the small tweaks necessary for the methods to fit our specifications. This mass sharing of both methods and entire workspaces is surely the future of bioinformatics at a time when reproducibility is strongly needed. I will certainly continue working with FireCloud and Terra going forward.
See the project description and check out the Terra workspace here:
https://app.terra.bio/#workspaces/workshop-ashg18/ASHG18-ToF-Reproducible-Paper
Enabling population-scale polygenic association studies
Denis Bauer, PhD
The VariantSpark team
Our first notebook on the Terra platform showcases our machine learning software, VariantSpark, which uses Apache Spark and utilized Terra’s capability of custom environment configurations. We are excited about enabling the global Terra research community to perform population-scale polygenic association analyses.
Check out the Terra workspace here:
https://app.terra.bio/#workspaces/fccredits-copper-blue-3695/TOSC19-VariantSpark