AnVIL in the Classroom: Cloud-scale educational resources for modern genomics

Genomics has become enough of a mainstream discipline to be introduced in undergraduate classes, even high school. There are lots of courses and online resources offered to help educators teach genomics, and heaps of literature about teaching methodologies. 

In my recent webinar hosted by the American Society for Human Genetics, I discussed the exciting opportunities that the move to cloud-based research infrastructure offers for educators who are interested in delivering practical instruction in genomics through hands-on exercises. 

Big thanks to my colleagues Liz Kiernan and Anton Kovalsky, both Lead Science Educators in the Data Sciences Platform at the Broad Institute, for their invaluable contributions to the development and execution of this webinar. 

 

Bridging the gap between teaching and research

Computational infrastructure has always been a challenge when it comes to hands-on teaching in scientific computing. Even at teaching institutions that are well-equipped with sophisticated computer labs, there often remains a deep divide between teaching environments and research environments, which end up siloed apart from each other. So as an educator, every time you want to bring data and tools from the research environment to a teaching setting, you have to cross that gap, which takes effort. You can easily end up with teaching examples that are out of date, or oversimplified, so learners only encounter toy versions of what researchers really do. 

 

photo of a canyon with a photo insert of a classroom setting on the left and one of a lab setting on the right, illustrating the concept of the great divide between teaching and research environments

 

It’s critical that we work toward bridging that gap, to reduce the distance between teaching examples and real research as much as possible. It is both more exciting for learners to be developing their skills through more realistic examples (closer to “real” scientific investigation), and more productive in terms of achieving educational goals. I would also expect that when they are ready to take the next step in their educational journey, learners are more likely to transition smoothly to more complex projects if they can build on prior experience rather than having to be re-trained to use different tooling.

The solution that is standing right in front of us is to take advantage of what’s happening on the side of research infrastructure, where for the past few years we’ve seen a big shift toward using cloud infrastructure.

 

divider line made of little curly clouds

 

Traditionally, everyone would use their own siloed computing infrastructure, and if multiple groups were working with the same dataset, they would each store a local copy to work on. So we’d end up with similar gaps between researchers as we were just talking about between researchers and educators. With the shift to the cloud, the idea is everyone can just access one copy of the data that’s stored centrally, and run whatever computation on hardware that’s right there, colocated with the data.

 

cartoon representation of cloud computing infrastructure acting as a bridge between researchers

 

This is not a new idea as such, but the reason it’s been gaining so much traction recently is because the scale of the datasets that are being generated in genomics and related disciplines: it’s just not feasible —or fair— to expect everyone to download a copy of the data and work on it locally. Hence the big commitments we’re seeing from various federal agencies to support infrastructure projects like the NHGRI AnVIL, which aims to enable genomics researchers to make effective use of cloud.

 

Putting the cloud to work for educators

The added benefit of the cloud model is that it also means everyone has access to the same type of machines, and you can package analysis tools in a way that anyone in the world can go and use in the same way without having to put a ton of effort into figuring out how to configure their local computing environment.

And that’s where we get to the key point I made in the webinar: this new cloud model is great for research, but it’s also a big opportunity for educators to be empowered to move in and use the same environments, datasets and tools that researchers use.

In support of this point, I gave a live demonstration of how an instructor could use AnVIL resources, specifically Jupyter Notebooks in Terra workspaces, to develop and administer practical instruction in genomics to a class of students. To see for yourself how this works, you can view the full recording of the webinar, which is available on demand from the AHSG learning portal

The portal requires an account login, but the account registration is free and does not require a paid membership with ASHG. 

 

We would love to hear from any educators who might be interested in trying out this model in their own teaching practice, so feel free to reach out to me specifically (geraldine@broadinstitute.org), or to the Terra support team through the community forum or helpdesk. We can help you identify resources and mechanisms to fit your needs and audience level. 

 


 

Resources

 

Presentation materials

 

Referenced in the live demo:

Configuration to launch custom notebook environment supporting embedded IGV

  • Custom environment container

    gcr.io/broad-dsde-outreach/terra-base:ipyigv1
  • Startup script

    gs://genomics-in-the-cloud/v1/scripts/install_GATK_4130_with_igv.sh

Workspaces and notebooks

 

Referenced in the slide deck: 

 

Further reading

  • Categories

  • Tags

  • Terra Color logo 300

    Fill out this form, and a team member will reach out.

    Trusted sharing