New benchmark report provides accurate ID of variants in medically-relevant genes
Researchers from the National Institute of Standards and Technology (NIST), Baylor College of Medicine and DNAnexus, Inc., together with other members of the Genome in a Bottle (GIAB) consortium, announce the publication of a comprehensive benchmark dataset comprising 273 challenging medically relevant autosomal genes that are associated with the development of diseases such as homocystinuria and spinal muscular atrophy. These datasets and a corresponding benchmark report were published in Nature Biotechnology.
Co-corresponding authors Dr. Fritz Sedlazeck of Baylor, Dr. Justin Zook of NIST and Dr. Jason Chin of DNAnexus led the team of researchers in focusing on a set of medically relevant genes that had been excluded from previous benchmarks due to their complexity. Using HiFi sequencing reads, the GIAB team identified thousands of single nucleotide variants, structural variants and large insertions and deletions in those genes in the two most commonly used human reference genomes. They also corrected errors in several medically relevant genes, improving variant recall accuracy to 100%.
“This benchmark will lay the groundwork for novel methods to improve our understanding of variability across these challenging but medically important genes,” said Sedlazeck, associate professor at the Human Genome Sequencing Center at Baylor. “Our findings can enable insights into new diseases’ gene candidates and treatments for a multitude of diseases.”
“This benchmark was made possible by a team of researchers who are committed to supporting research into disease-associated variation across the genome, including diseases like spinal muscular atrophy caused by variation that is challenging to detect,” said Zook, biomarker and genomic sciences group leader at NIST. “But there is still work to be done. Future studies will create benchmarks for complex regions of variation in the genome that are still challenging to characterize even with long reads.”
“To date, common bioinformatics tools have been unable to characterize many clinically relevant genes because they are difficult to assess with current short-read sequencing technologies due to their complexity and repetitive nature,” said John Ellithorpe, president at DNAnexus. “We are proud that DNAnexus researchers are able to contribute to important scientific advancements and benchmarking reports such as this one, and we hope that these efforts will contribute to the advancement of the diagnosis and understanding of various genetic diseases and their heritability.”
For a full list of authors and funding, see the publication.