If you missed the ASHG workshop on human pangenome analysis last month, you’ll be thrilled to hear that the UCSC team uploaded their presentation videos to a Youtube playlist, so anyone can now learn to analyze pangenomes on the cloud!
The workshop was developed by Dr. Karen Miga’s team at the UCSC Genomics Institute. Dr. Miga leads the Human Pangenome Reference Consortium (HPRC), an exciting collaborative effort that aims to construct a human pangenome reference in order to improve the genomic representation of diverse human populations.
In mid-2021, the HPRC released a first crop of data generated from 30 samples using a combination of several sequencing technologies, including long reads. The data is available through several repositories, including the NHGRI’s AnVIL, and can be accessed directly from Terra for analysis. To facilitate computational reproducibility and reuse of the pangenome resources they produce, the HPRC team shares their data processing workflows via GitHub and Dockstore. They also provide a Terra/AnVIL workspace that bundles the “Year 1” data along with preconfigured workflows that can be used to reproduce the full analysis.
The six presentations in the workshop playlist provide a comprehensive overview of the project goals and scientific underpinnings, then dive into a series of hands-on demos that show how to use the workflows and workspace in practice (see summaries below). These videos are designed to be accessible to a wide audience, so we encourage you to check them out!
Note that accessing the Terra/AnVIL workspace requires a (free) Terra account. Cloning the workspace and working through the demo for yourself requires a billing account to cover Google Cloud compute costs; free credits are available to fund this. See https://anvilproject.org/learn for more information on getting started with AnVIL resources.
Summaries of the pangenome workshop playlist videos:
Introduction to the Human Pangenome Reference
Introduction to the goals of the project, how the consortium is sharing their results early and often, and the currently available data from Year 1. — Karen Miga, Director of the Human Pangenome Reference Consortium
Introduction to AnVIL and how it supports HPRC
A brief introduction to the NHGRI AnVIL ecosystem and what it offers to data scientists, including an example of how the Human Pangenome Reference Consortium is leveraging AnVIL for transparent science. — Beth Sheets, Program Manager
Exploring and Accessing HPRC Data Resources in AnVIL
Demo of how to find, explore, and access data from Year 1 from the Human Pangenome Reference Consortium on AnVIL. Data can currently be accessed in this AnVIL/Terra workspace (requires a Terra account). — Julian Lucas, Bioinformatics Systems Analyst
Introduction to Genomic Workflows from HPRC
Introduction to Genomic Workflows written in the Workflow Description Language (WDL) that can be launched in the NHGRI AnVIL cloud ecosystem. Covers the following topics: Why use containers and workflow languages; How to build a container; How to write a workflow in the Workflow Description Language (WDL); and Tools for testing your WDL workflow. — Trevor Pesout, PhD Candidate
A Demo of Running The HPRC Assembly Workflow in AnVIL
Introduction and demo of an assembly workflow that has been developed and used to build a reference that will be included in the Human Pangenome. The video demonstrates: Where to find the workflow in Dockstore; How to launch the workflow from Dockstore into AnVIL’s Terra environment; The types of data needed and how to set up data input for the workflow; and How to find, view, and download output files from the workflow. — Mobin Asri, PhD Candidate
Calling Variants with a Pangenome in AnVIL
Introduction to pangenomes and demo of variant calling using Giraffe and DeepVariant tools with a pangenome reference in AnVIL (see this AnVIL/Terra workspace). — Julian Lucas, Bioinformatics Systems Analyst