Samuel Terkper Ahuno is a student at the Tri-Institutional PhD Program in Computational Biology and Medicine, NYC. In this guest blog post, he describes his published work on African cancer genomics, which evaluated the feasibility of using liquid biopsies to detect breast cancer in a Ghanaian clinic, and shares his vision of a cloud-powered future for computational research in Africa.
Cancer in sub-Saharan Africa is becoming common, with increasing mortality. Current efforts to mitigate this are focused on increasing public awareness, earlier diagnosis, increasing access to treatment and care and researching the lifestyle, environmental and genetic risk factors that might be more prevalent for African women.
One major obstacle we are facing is that the current standard of testing for most cancers, including breast cancer, is the traditional biopsy: extracting a small piece of the tumor surgically using a needle. As a result, many cancers are diagnosed only after considerable growth has occurred. Therefore, technologies for earlier detection could make a big difference to patient outcomes. Additionally, less invasive procedures would be better accepted by the population, and could enable repeated sampling and improved treatment monitoring.
Using liquid biopsies to detect breast cancer in Ghanaian patients
Liquid biopsy techniques address these challenges by using readily available biological fluids such as urine, blood, or saliva for diagnosis. These fluids contain circulating or “cell-free” DNA (cfDNA), some of which may be coming from tumor cells and are then called circulating tumor DNA. A liquid biopsy consists of sampling the relevant fluids and testing for the presence of circulating tumor DNA (ctDNA) or other such markers of cancer.
Our research group recently tested whether liquid biopsies could be used to detect breast cancer in Ghana, as part of the Ghana Breast Health Study. From each patient who was recruited into the study and came to one of three hospitals, a small amount of blood was collected, then extracted DNA from the blood was sequenced. This enabled us to estimate how much of the cfDNA was shed by the patient’s tumor into the blood and what sort of DNA damage was from the associated tumor.
We found encouraging results, suggesting that liquid biopsies could be a viable way to detect cancer markers such as copy number alteration (CNA) status for many selected breast cancer genes in Ghanaian patients. Copy number alteration is a type of cancer-associated mutation where one or more segments of the DNA are either lost or duplicated. Yet, the adoption of this diagnostic approach would require developing genomic and bioinformatics capacity within the country while also strengthening basic health care services to make sure women can gain access to the treatment they need to pursue this research further, and ultimately empower clinics to offer these tests in a sustainable and cost-effective way.
The computational requirements of cancer genomics
Going from raw cfDNA sequence data to biological insights about each patient’s tumor involves complex bioinformatics procedures that we can divide in two main stages of analysis, with very different computational requirements.
The first phase consists of pre-processing the data to ensure we have high quality information in a suitable format for identifying tumor DNA. In practice, this involves mapping each individual sequence read to a standard genome reference, and applying stringent quality control measures (see GATK Best Practices for more details). This is the most computationally intensive step of the analysis pipeline; for whole genomes with billions of reads, you can imagine how complicated it can get.
The second phase consists of estimating what fraction of the circulating DNA is likely to have originated from a tumor, and identifying CNAs (see ichorCNA documentation for more details).
A hybrid approach to achieve scalability without changing everything
Given the computationally intensive nature of the pre-processing phase, we performed that part of the work using cloud-optimized workflows that we ran on the Terra platform. This allowed us to scale execution very easily and not have to worry about managing high-performance computing resources directly.
For the second phase of the work, which did not pose any scaling challenges, we chose to use our existing tools on the Mount Sinai Hospital servers. It was easy enough to download the pre-processed outputs from Terra onto our local filesystem.
This hybrid approach allowed us to take advantage of Terra’s scalable batch processing capabilities without having to change our familiar environment for the more exploratory part of the work. If we were to do this again with a larger dataset, downloading the pre-processed outputs would probably be less feasible, and it might be worth it for us to look into Terra’s interactive cloud environments for doing the rest of the work on the platform as well.
The bigger picture of cloud computing in Africa
The study I presented here was the result of an international collaboration between multiple research and clinical institutions in Ghana, as well as in Canada, the UK and the United States. Strengthening Global Partnerships plays an important role and part of the United Nations Sustainable Development Goals of economic development, yet for approaches like liquid biopsies to become the standard of care in Ghana and many other African countries, we must ultimately develop the bioinformatics capacity to perform the relevant research and testing autonomously in-country.
One of the major challenges for bioinformatics and computational biology in many African countries is the limited infrastructure such as computing resources, even though the cost of computing is becoming cheaper and associated with increasing efficiency.
Cloud-powered platforms like Terra could play a huge role in increasing access to computing resources to enable scalable genomics research in Africa, by Africans.
In addition to providing access to powerful hardware resources, such platforms also make it possible to leverage publicly available workflows and pre-installed software tools and environments. This helps newcomers overcome initial learning curves and empowers seasoned researchers to leverage best in class tooling without having to spend time installing anything. Once familiar with the infrastructure, they can also develop their own workflows and tools to innovate in the pursuit of their preferred research question.
Organizations such as H3Africa have over the years been building bioinformatics capacity in affiliated institutions in the region. Building on that work, the DSI-Africa consortium recently launched the eLwazi platform, an African-led open data science project powered by Terra.
However, moving forward it will be great to have data centers within Africa to enable regional processing, storage and control of genomic data due to privacy and ethical reasons.
There are still many practical, ethical and technological challenges to implement genomic technologies in Africa, yet it is encouraging to see such progress toward a future where African countries such as Ghana can access the resources they need to chart their own course.
I would like to thank Paz Polak, PhD; Jonine Figueroa, PhD; Geraldine Van der Auwera, PhD, and Kofi Johnson, PhD, for helpful comments.