chronic lymphocytic leukemia

Paper Spotlight: Molecular map of chronic lymphocytic leukemia and its impact on outcome

This blog is part of our Paper Spotlight series, which features peer-reviewed research publications involving work done in Terra and highlights how the analysis methods were applied.

Molecular map of chronic lymphocytic leukemia and its impact on outcome

By Binyamin A. Knisbacher, Ziao Lin, Cynthia K. Hahn, Ferran Nadeu, Martí Duran-Ferrer, Gad Getz, Chip Stewart, Catherine J. Wu et al.

Nature Genetics (2022)

Abstract: Recent advances in cancer characterization have consistently revealed marked heterogeneity, impeding the completion of integrated molecular and clinical maps for each malignancy. Here, we focus on chronic lymphocytic leukemia (CLL), a B cell neoplasm with variable natural history that is conventionally categorized into two subtypes distinguished by extent of somatic mutations in the heavy-chain variable region of immunoglobulin genes (IGHV). To build the ‘CLL map,’ we integrated genomic, transcriptomic and epigenomic data from 1,148 patients. We identified 202 candidate genetic drivers of CLL (109 new) and refined the characterization of IGHV subtypes, which revealed distinct genomic landscapes and leukemogenic trajectories. Discovery of new gene expression subtypes further subcategorized this neoplasm and proved to be independent prognostic factors. Clinical outcomes were associated with a combination of genetic, epigenetic and gene expression features, further advancing our prognostic paradigm. Overall, this work reveals fresh insights into CLL oncogenesis and prognostication.


What part of the work was done in Terra?

Excerpts from the paper’s Methods section:

Sequence data processing and analysis

All sequencing data (WES, WGS, RNA-seq, RRBS, and targeted NOTCH1 sequencing) were processed and analyzed using methods implemented in the Terra platform ( The main Terra methods are available at  […]

RNA-seq analysis

RNA-seq data were processed in Terra using the GTEx V7 pipeline ( Briefly, reads were aligned with STAR (v2.6.1d) to hg19 (b37) using the GENCODE v19 annotation, and quality control metrics and gene expression were computed with RNA-SeQC v2.3.6 ( A collapsed version of the GENCODE annotation was used to quantify gene-level expression (available at gs://gtex-resources/GENCODE/gencode.v19.genes.v7.collapsed_only.patched_ contigs.gtf). TPMs were used for sample clustering, whereas gene counts were used for differential gene expression, as required[…]

DNA methylation data processing

DNA methylome data was analyzed for a total of 1,037 samples, including 490 samples profiled with Illumina 450K array previously analyzed52 (European Genome-phenome Archive (EGA) accession EGAD00010001975), and 547 samples profiled using RRBS with either single-end or paired-end approaches. A pipeline was developed in Terra to obtain the CpG methylation estimates from RRBS data (Supplementary Note). The epitype classifier and the epiCMIT mitotic clock were previously developed for Illumina 450K and EPIC array data […]


How did they do it?

The authors processed and analyzed all sequencing data (WES, WGS, RNA-seq, RRBS, and targeted NOTCH1 sequencing) using WDL workflows. The methods used in this study can be found in this Terra workspace.

If you are a new Terra user, try your hand at running a workflow in Terra with this Quickstart Tutorial Workspace.


 Appendix: Data and code availability