EXPERT REACTION: First draft of human 'pangenome' reference sequence captures more human diversity

Publicly released:
Australia; International; NSW; QLD; SA; ACT

International researchers have released the first draft of a reference 'pangenome' - a collection of DNA sequences from 47 people -  that better reflects the diversity of the human population and can be used as a comparison to study genetic disorders and other human DNA sequences. To understand the differences that make us unique, scientists create a reference genome sequence that other DNA sequences can be compared to. But until now that standard reference genome was limited in its ability to reflect human diversity as it was based on the DNA of only 20 people and most of it came from only one person. The new “pangenome” reference includes genome sequences of 47 people from diverse ancestries, with the researchers hoping to increase that number to 350 by mid-2024.

Media release

From: Springer Nature

Genetics: First draft of a human pangenome

The first draft of a human pangenome reference — a collection that aims to eventually represent as many as possible of the DNA sequences found across our species — is published in Nature this week. The research combines genetic material from a population of 47 genetically diverse individuals to provide a more complete image of the human genome.

The human reference genome has been the backbone of human genomics since its draft release in 2001. However, a single genome cannot represent the genetic diversity present within the human species, due to the presence of structural variants and alternative alleles, some of which were not present in the original reference genome.

In a collection of three papers, the Human Pangenome Reference Consortium presents the first draft human pangenome reference and findings from two studies that use this reference as a basis for new genetic research. The pangenome was developed from a cohort of 47 ancestrally diverse individuals and adds 119 million base pairs and 1,115 gene duplications (mutations in which a region of DNA containing a gene is duplicated) to the current reference human genome (GRCh38). Use of this draft increased the number of structural variants detected by 104% compared to GRCh38, providing a more complete picture of genetic diversity within the human genome.

Two companion papers present associated findings using the human pangenome draft. In the first companion paper, Evan Eichler and colleagues developed a map of single-nucleotide variations (SNVs) within segmental duplications (blocks of DNA that occur at more than one site in a genome and share a high level of sequence identity), characterising millions of previously unmapped SNVs and mutational properties that differ from unique DNA. Erik Garrison and colleagues observe patterns of recombination between the short arms of heterologous acrocentric (where the centromere is located near one end of the chromosome) chromosomes, providing observational evidence for a mechanism of DNA exchange between these chromosomes that had previously been speculated on but not observed, due to the lack of suitable data.

These results are only an interim stage of the envisioned human pangenome, which aims to capture genetic diversity of 350 individuals. Arya Massarat and Melissa Gymrek highlight the importance of these advancements in an accompanying News & Views Forum but note that continued improvements are needed to overcome some remaining challenges, such as the need for even more diverse sampling. “This will ultimately make it easier to discover genetic variants that mediate physical and clinical traits, and — it is to be hoped — will eventually lead to better health outcomes for many people”, they write.

A fourth paper will also publish in Nature Biotechnology.

Expert Reaction

These comments have been collated by the Science Media Centre to provide a variety of expert perspectives on this issue. Feel free to use these quotes in your stories. Views expressed are the personal opinions of the experts named. They do not represent the views of the SMC or any other organisation unless specifically stated.

Distinguished Professor Tuan Nguyen is Director of the Centre for Health Technologies at the University of Technology Sydney

It is important to recognize that a solitary reference sequence cannot fully capture the genomic diversity of global populations. It is evident that Asian populations remain underrepresented in the project. For instance, the inclusion of only one Vietnamese individual is inadequate when considering the genetic variation within Vietnam's more than 50 ethnic groups, comprising approximately 100 million people. However, I am hopeful that in the future, collaborations can be established between the project and my team in Vietnam, as well as other locations, to enhance the representativeness of the Pangenome Reference database.

Last updated:  21 Jul 2023 5:01pm
Contact information
Contact details are only visible to registered journalists.
Declared conflicts of interest None declared.

Professor David Adelson is Chair of Bioinformatics and Computational Genetics in the School of Biological Sciences at The University of Adelaide

Why is a pangenome so exciting? It includes all the differences between the genomes of the individuals that have been sequenced. All you have to do is look around you to see how different people are. These differences reflect differences in our genomes. Up until now we have used a single genome sequence as a reference for the detection of genetic changes that cause disease. That reference did not include differences between people or populations.

With the pangenome we can now look for genetic changes across many individuals and ultimately the pangenome will grow to include information from thousands and perhaps millions of genome sequences. This means our ability to use genetic information for diagnosis will increase enormously. With the current pangenome from only 45 humans, the accuracy of detection to find genetic changes has gone up by 34% and the number of large, difficult-to-detect changes we now know about has gone up by over 100%! This paper heralds a new age of genetic diagnosis, that will benefit people from all ancestries, unlike our current reference genome that does not reflect all the diversity of humanity.

Last updated:  21 Jul 2023 5:01pm
Contact information
Contact details are only visible to registered journalists.
Declared conflicts of interest David states that he has no conflicts of interest to declare.

Associate Professor Michael Gabbett is a Chief Investigator in the Centre for Genomics and Personalised Health at Queensland University of Technology (QUT)

The publication of the pangenome is a huge step forward in documenting the diversity of the human genetic code in a single map. It aims to record normal variations in the sequence and structure of our genetic code, helping to ensure all races and ethnicities are represented. Such a tool is invaluable for a multicultural society such as Australia. Once finished, the pangenome will be a powerful instrument that will help us diagnose genetic diseases more accurately and discover abnormal genetic variations that can lead to ill-health.

The entire human genome was first published in the early 2000s. This was a significant event that provided genetic researchers and medical doctors a template of ‘normal’ genes to compare other individuals’ DNA to, enabling the discovery of genetic variants that could be causing disease or physical differences.

Of course, there is no such thing as ‘normal’ genes. Humans are wonderfully diverse creatures, with each individual having their own unique collection of genetic variants. A single reference sequence cannot and does not capture the extraordinary genetic diversity of people around the globe. This fact has placed barriers to our ability to fully interpret a person’s genetic sequence. How do we know if a genetic variation is normal or not? How do we know if a variation causes disease, or is commonly found in a particular ethnic group? The pangenome project is helping to answer these important questions to better our understanding of genes and improve the health of all people.

Last updated:  21 Jul 2023 5:02pm
Contact information
Contact details are only visible to registered journalists.
Declared conflicts of interest Michael states that he has no conflicts of interest to declare.

Associate Professor Benjamin Schwessinger is from the Research School Biology at The Australian National University

A recent publication in Nature describes the extension of the human genome beyond a single reference to capture more of the variation inherent in our human population. By adding 47 new genomes, they discover over 5% of additional sequence. Researchers also implemented a new representation of these human genomes combined called “pan-genome”.

You can imagine this as a new road map to drop off your kids at school. While you only take the same road every day, your neighbour might take a slightly different road on a side street. While you were never aware of this side road this new approach of a “pan-genome” maps out these alternative routes that make us humans so distinct from each other.

Last updated:  10 May 2023 4:21pm
Contact information
Contact details are only visible to registered journalists.
Declared conflicts of interest Benjamin states that he has no conflicts of interest to declare.

Mr Andreas Bachler is a PhD candidate at The Australian National University and CSIRO

The release of a new pan-genome reference marks a paradigm shift in genetic research. For more than two decades much of human genetic work has relied on the use of a single ‘reference’ through which we then interpret genetic data we collect from individuals. Problems with the reference are well known, for example, the initial reference was hoped to be a composite of 20 diverse donors but approximately 70% came from a single individual (please see article “Is it time to change the reference genome?”, 2019 for more great idiosyncrasies about the human reference!). The pan-genome paper utilises recent advances in sequencing technology to generate ‘references’ from individuals, allowing us to start to break free from the constraints of a single reference and interrogate data with less bias. 

The current pan-genome utilises publicly available samples from broad populations and does not include First Australians. Sovereignty over samples and genetic data is a crucially important aspect for First Australians and current work at the National Centre for Indigenous Genomics (NCIG) is committed to providing and enabling genetic references and data that align with First Australian interests.

While a substantial boon for human genetics the demonstration of this approach will allow those of us working in non-human organisms to implement this into our own research and workflows. Particularly in genetics, not accounting for diversity produces biased and incorrect research outcomes.

Last updated:  10 May 2023 4:20pm
Contact information
Contact details are only visible to registered journalists.
Declared conflicts of interest Andreas states that he has no conflicts of interest to declare.

Multimedia

How to Sequence a Human Genome in 7 'Easy' Steps!
The Human Pangenome

Attachments

Note: Not all attachments are visible to the general public. Research URLs will go live after the embargo ends.

Research Springer Nature, Web page Please link to the article in online versions of your report (the URL will go live after the embargo ends).
Journal/
conference:
Nature
Research:Paper
Organisation/s: University of California, USA
Funder: National Human Genome Research Institute (NHGRI), part of the National Institutes of Health. See papers for full list of funders and conflict of interest
Media Contact/s
Contact details are only visible to registered journalists.