For additional background, see Launching seqr in Terra.
By Lynn Pais
Lynn Pais is a senior clinical genomic variant analyst and product owner for seqr in the Medical and Population Genetics group at the Broad Institute. In this guest blog post, Lynn introduces seqr, an open-source genomic analysis platform powered by Terra.
Identifying a genetic diagnosis for individuals with rare monogenic diseases often requires sifting through mountains of genomic data. To help researchers tackle this challenge, seqr offers an open-source, web-based platform housed within the NHGRI’s Genomic Data Science Analysis, Visualization, and Informatics Lab-Space (AnVIL). Our flagship paper describing the platform is available here: seqr: A web-based analysis and collaboration tool for rare disease genomics. Human mutation. Some of the key features of seqr include:
- Advanced Annotations & Filtering: seqr provides rich gene and variant-level annotations and powerful filtration tools to perform variant searches within a family or across projects.
- User-Friendly Interface: seqr is designed to be accessible to a variety of users, including researchers, clinicians, and project managers.
- Collaboration: seqr supports researchers around the world to work together on analyzing genetic data from families with rare diseases. Through the Matchmaker Exchange interface, seqr also supports the submission of candidate variants/genes and phenotypic information to an international network for gene discovery.
- Data Management: seqr offers a central location for storing large amounts of genetic data alongside de-identified patient information. This simplifies project management.
- Improved Diagnosis: With the continual addition of genomic research tools and support for analyzing new data types, seqr aims to accelerate diagnoses for rare diseases.
seqr is usable out-of-the-box and has over 46,500 WES/WGS samples, of which 12,000 samples have been submitted by external users through AnVIL. At the Broad, seqr has supported the diagnosis of more than 4,000 individuals with rare disease and the discovery of over 300 novel disease genes.
The tool is made available to the research community as a public instance operated by the Broad Institute and available through Terra. You can move seamlessly from secondary analysis (calling variants with GATK workflows in Terra) to tertiary analysis of your callset with seqr.
Analyzing data with seqr in Terra
Broad’s seqr instance is connected to Terra (GCP) and set up to run on data in your workspace bucket. It expects a joint-called VCF file of variants generated with a joint calling pipeline, such as the GATK Whole-Genome Joint-Calling workflow. After clicking the Files icon at the left of the Data page, use the seqr link at the top right to create a project and load data into seqr. Once a project has been created, this link will take you directly to the seqr interface.
[Note: Data import involves QC/validation and annotation steps that optimize the data for querying. This can take up to a week to run depending on factors including the size of the joint call VCF. Once complete, you can run searches on large datasets efficiently, so it’s worth it.]
Once the data is loaded (you’ll get an email notification), you can get started with your analysis in seqr. There are standardized inheritance-based searches you can use to filter variants, with options to customize a search, search for a specific variant across your cohort and other projects in seqr, and many more features to support your analysis. See our video playlist for a demonstration of some of these features.
[In legend:The seqr platform features a user-friendly interface that facilitates exome and genome sequence analysis through filtering and display of extensive gene- and variant-level annotations. Users can add shared notes and tags to provide further contextual understanding and mark variants for follow up. The platform also facilitates visualization of read data through its IGV integration and data sharing through the Matchmaker Exchange. See the video playlist/tutorial for more details on these features]
Even without loading case data in seqr, AnVIL-registered users can look up a variant of interest to review it with seqr’s informative annotations. While this feature is only available for variants that are present in seqr, the display also includes de-identified information about the cases in which the variant was found such as genotype, affected status, and high-level phenotype categories.
Try seqr on Terra today.
We invite you to try it out for yourself. To get started, check out our video tutorials, including a video describing how to load your data in seqr (last video). When you create a new account, there is a demo project you can use to try seqr out right away, even if you don’t have your own data yet.
If you run into any issues using seqr in Terra, don’t hesitate to reach out to the Terra helpdesk (go to the main menu at the top left of any page in Terra, open the Support section, and choose “Contact Us”). For other questions about seqr, see the FAQ page. If your question is not answered here, contact the seqr team at seqr@broadinstitute.org.