How to explore the human genome
10 Jan 2012 by Evoluted New Media
It’s been over 10 years since the first draft of the human genome was published and since then the race has been on to identify disease-causing genetic variations. As we move ever closer to the goal of affordable personal genome sequencing, a new whitepaper by Oxford Gene Technology (OGT) provides an insight into how a balance between microarray-based approaches and genome sequencing might be the best way to explore the human genome
Comprised of more than 3 billion base pairs, the genomes of two unrelated people are over 99% identical. The remaining 1% contains a mixture of sequence variants that range in size from a single base (single nucleotide polymorphisms or SNPs) to small indels (insertions or deletions less than 1kb in size) and larger copy number variations (CNVs)1. Many sequence variants have no associated disease phenotype whilst others, which include inherited and de novo changes, can predispose people to diseases such as autoimmune disease2, asthma3, schizophrenia4, obesity5 as well as a variety of cancers6-8.
The development of genome analysis technologies such as DNA microarrays and next generation sequencing (NGS) has provided the researcher with the unique ability to screen for sequence variants of clinical relevance. Although DNA microarrays and NGS might be viewed as competing platforms, many research questions might be easier to solve if the two were used to complement one another.
Array comparative genomic hybridisation (aCGH) platforms, which allow the detection of known and de novo CNVs present in a cell or tissue, play an important role in genome analysis and have had a major impact on the diagnosis of genetic disorders, accelerating CNV discovery for many diseases9. In addition, genome-wide association studies (GWAS) using oligonucleotide aCGH are considered the gold-standard for CNV detection10. In a clinical setting they have been key for identifying novel disease loci11 and recent data suggest that they will have a pivotal role in prenatal diagnosis12.
NGS offers an alternative approach to genome analysis, providing single base resolution that has permitted the successful identification of causal mutations for a number of monogenic disorders13-15 as well as for cancer16,17.
Thus far, microarrays have proven the preferred solution for performing genomic analysis. Microarrays are an established technology and can routinely detect aneuploidy, unbalanced chromosomal rearrangements, subchromosomal deletions or duplications, loss of heterozygosity and SNPs (Table 1). The power of microarrays for detecting such variants comes from the density, coverage and genomic distribution of oligonucleotides on the array. This is of particular relevance clinically, and can be addressed by utilising high-density genome-wide array designs, or designs combining probes for specific focus regions with lower density probes covering the genomic backbone.
More recently, NGS has begun to have a widespread impact on genomic research, as the costs of performing this type of analysis have dropped significantly, thereby putting it within reach of a broader range of researchers. Sequencing offers the ability to detect the sequence variants outlined above for microarray analysis, but can also be used to screen for copy-neutral variants (e.g. balanced chromosomal inversions or translocations), indels or single base variants (e.g. point mutations) (Table 1). NGS also provides the user with the capability to scan for disease-causing variants without a priori sequence information.
[caption id="attachment_26129" align="alignright" width="300" caption="Table 1: DNA sequence variants detected by microarrays and NGS"][/caption]
Microarrays enable the parallel analysis of large numbers of samples and thus offer the potential to classify patient cohorts relatively quickly and cost-effectively. For example, OGT has used its high-throughput technology to process 20,000 samples in 20 weeks as part of the Wellcome Trust Case Control Consortium (WTCCC) CNV study18. As the company can parallel process multiple samples on a single microarray slide, significant time and cost savings can be made. As it has been used since the early 1990s, microarray analysis is now significantly well developed, making it quick and easy to generate meaningful biological inferences from the data generated.
In contrast, whole genome sequencing can be more costly with a long turn-around time, while generating large amounts of data requiring significant expertise, processing power and time for analysis. To address these shortcomings, it is possible to adopt a more focussed approach to sequencing:
- Whole exome sequencing focuses on just the 1.5% of the human genome corresponding to gene encoding regions that contain approximately 85% of disease-causing mutations13.
- Custom sequencing targets specific region(s) of interest (ROI) ranging from 0.2 – 34 Mb. Focusing in on one or more ROI enables increased depth of coverage for those regions and increased confidence when detecting causal mutations.
At present no single platform, either microarrays or NGS, can identify all sequence variants within the genome. Although both platforms function perfectly well in isolation, each offers complementary qualities that can, in combination, be used to identify and screen for known or de novo sequence variants. The exact order in which the platforms are used depends on the types of questions that need to be answered.
If the requirement is to screen a large number of samples to identify a particular subset or genomic region for more comprehensive analysis7, microarrays will be more effective for screening followed by sequencing. If the goal is discovery, sequencing could be used to identify sequence variants with biomedical relevance15. This information could then be used to generate new diagnostic arrays or add additional content to existing diagnostic arrays. The correct combination is essential to ensure that the most information is obtained with a careful balance needed between cost and information required. In the future, making informed decisions during the planning stages of a genomic analysis will help to ensure that the data generated is accurate, relevant and biologically meaningful.
- This article is based on a whitepaper entitled ‘Sequencing and microarrays for genome analysis: complementary rather than competing?’ by OGT’s Simon Hughes, Sandra Lam and Nicole Sparkes. To read the full whitepaper, visit the OGT website at www.ogt.co.uk/resources or use your smartphone to scan the QR code below
1. Gökçümen, O. and Lee, C. (2009) Copy number variants (CNVs) in primate species using array-based comparative genomic hybridization. Methods 49, 18-25
2. Fanciulli, M. et al (2007) FCGR3B copy number variation is associated with susceptibility to systemic, but not organ-specific, autoimmunity. Nature Genetics 39, 721-723
3. Brasch-Andersen, C. et al (2004) Possible gene dosage effect of glutathione-S-transferases on atopic asthma: using real-time PCR for quantification of GSTM1 and GSTT1 gene copy numbers. Human Mutation 24, 208-214
4. Walsh, T. et al (2008) Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science 320, 539-543
5. Walters, R.G. et al (2010) A new highly penetrant form of obesity due to deletions on chromosome 16p11.2. Nature 463, 671-675
6. Hughes, S. et al (2006) The use of whole genome amplification to study chromosomal changes in prostate cancer: insights into genome-wide signature of preneoplasia associated with cancer progression. BMC Genomics 7, 65
7. Ernst, T. et al (2010) Transcription factor mutations in myelodysplastic/myeloproliferative neoplasms. Haematologica 95, 1473-1480
8. Dyrsø, T. et al (2011) Identification of chromosome aberrations in sporadic microsatellite stable and unstable colorectal cancers using array comparative genomic hybridization. Cancer Genetics 204, 84-95
9. Shaffer, L.G. et al (2007) The identification of microdeletion syndromes and other chromosome abnormalities: cytogenetic methods of the past, new technologies for the future. American Journal of Medical Genetics 145C, 335-345
10. Carter, N.P. (2007) Methods and strategies for analyzing copy number variation using DNA microarrays. Nature Genetics 39 (7 suppl), S16-S21
11. Slavotinek, A.M. (2008) Novel microdeletion syndromes detected by chromosome microarrays. Human Genetics 124, 1-17
12. Kleeman, L. et al (2009) Use of array comparative genomic hybridization for prenatal diagnosis of fetuses with sonographic anomalies and normal metaphase karyotype. Prenatal Diagnosis 29, 1213-1217
13. Choi, M. et al (2009) Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proceedings of the National Academy of Sciences of the United States of America 106, 19096–19101
14. Ng, S.B. et al (2010) Exome sequencing identifies the cause of a mendelian disorder. Nature Genetics 42, 30–35
15. Ng, S.B. et al (2009) Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276
16. Wei, X. et al (2011) Exome sequencing identifies GRIN2A as frequently mutated in melanoma. Nature Genetics 43, 442-446
17. Yan, X.J. et al (2011) Exome sequencing identifies somatic mutations of DNA methyltransferase gene DNMT3A in acute monocytic leukemia. Nature Genetics 43, 309-315
18. Conrad, D.F. et al (2010) Origins and functional impact of copy number variation in the human genome. Nature 464, 704-712
19. McPherson, J.D. (2009) Next-generation gap. Nature Methods Supplement 6, S2-S5.
Author:
By Stephen Archibald, PhD, Marketing Communications Manager, Oxford Gene Technology