Global initiative aims to identify disease-linked genome variations
31 Oct 2022
A newly formed international collaborative science project created with the aim of identifying variations in the human genome linked with disease is calling for dataset contributions.
The Consortium for Long Read Sequencing’s (CoLoRS) goal is to create a publicly-available database of long-read genome sequences of human genomes.
Work on entries for the database is expected to start this year and the consortium has invited on investigators with “raw or summary level” human genome datasets to contribute.
CoLoRS describes itself as an open coalition of international researchers focused on creating a comprehensive database of frequency information for all classes of human variation identified using long-read human whole-genome sequencing.
Long-read sequencing accesses regions of the genome inaccessible to other technologies and is capable of detecting up to 15,000 more structural variants and 300,000 more small variants.
CoLoRS plans to complement existing databases, help improve the discovery of pathogenic variation, and advance the understanding of rare disease, for which more than half of cases remain unexplained after short-read genome sequencing.
Edd Lee, Director of Human Genomics Segment Marketing at sequencing solutions developer PacBio, which has played a lead role in driving the consortium, described the initiative as a “much-needed resource for the genomics research community”.
“Population frequency is a key tool for interpreting genetic variation. CoLoRS will extend this tool to the variation uniquely detected by HiFi sequencing, particularly structural variants, tandem repeats, and small variants in regions of the genome that are difficult to sequence using other technologies,” added Lee.
CoLoRS’ global founder members representing research hospitals, universities, and laboratories will provide datasets for the initial set of genomes.
Data will be accessible via National Human Genome Research Institute’s (NHGRI) Analysis, Visualization and Informatics Lab-space (AnVIL) – a cloud-based genomic data sharing and analysis platform.
Supporting funds have been provided by the US National Institutes of Health Office of Data Science Strategy and NHGRI.
Michael Schatz, Bloomberg Distinguished Professor at Johns Hopkins University, USA said the new database would mean “we will finally be able to consider all types of variation across the entire human genome.”