Unique Array data available to all
22 Mar 2005 by Evoluted New Media
As the second phase of the DTI’s Measurements for Biotechnology (MfB) programme gets underway, a large DNA microarray dataset generated by an LGC-led consortium during the first phase is now freely available online at ArrayExpress.
The arrayer used within the Bio-Molecular innovation Laboratory at LGC |
DNA microarray technology first came to light in the mid 1990s, and since then it has developed into a major tool for the investigation of global gene expression for all aspects of human disease and in biomedical research, resulting in a multi-billion dollar industry. Specific current and future applications include identifying and validating novel targets early in the drug development process, pharmacogenomics for pre-clinical and clinical trials, identification of predictive signatures of drug toxicity, identification of biomarkers for complex diseases, disease diagnostics, stratification, prognostics, personalised medicine and therapy responsiveness monitoring.
Array Close-up |
Standardisation and comparability
DNA microarray platforms and applications are undergoing rapid development, but creating reproducible data with a high level of consistency across experiments and various platforms is widely accepted by the scientific and regulatory communities as a major problem. There is a lack of standardisation in the area and a lack of information regarding comparability of data produced to different standards, on different platforms. This is hindering the transition of microarrays from a tool used purely for research to a valid measurement tool capable of being used in diagnostics and in support of regulatory data submissions necessary for the approval of new drugs.
The National Measurement System (NMS) have identified comparability of microarray measurements as being a major issue affecting full-scale exploitation of data obtained from gene expression experiments. To gain regulatory acceptance the tools used to measure gene expression must be analysed for scientific value, robustness and consistency of results across competitive platforms and technologies. In particular, the U.S. Food and Drug Administration (FDA) have highlighted a need for consistent, scientifically based, analytical guidelines to use the new tools appropriately to enable an assessment of genomic expression data, irrespective of the platform. A major obstacle to achieving these goals is the current lack of public access to array datasets which will be necessary for widescale validation of the data produced in such experiments and their biological significance.
LGC-led consortium
LGC is currently working towards improving the comparability and quality of array-based measurements through a number of MfB funded initiatives. Under the first programme, LGC led a consortium*, consisting of collaborators from UK industry, academia and National Measurement Institutes, to investigate the comparability of gene expression measurements on different microarray platforms. Five commercially available microarray platforms, that probe the human genome, were used to compare the differential expression status of a brain sample against a universal reference sample (Figure 1).
Figure 1. Comparison of differential expression status of
brain sample againsta universal reference sample
24 replicate arrays were performed for each manufacturer, and each single array probed between 4,000 to 30,000 genes.
The main objectives of this project were to determine the accuracy and consistency of gene expression measurements made on different microarray systems, and to provide an evaluation of some of the popular normalisation strategies currently in use for array-based experiments. The effect of using different software packages on image processing and downstream analysis was also assessed. The large, controlled dataset produced in this project enabled a detailed statistical assessment of data comparability both within and between array platforms to be undertaken.
Data concordance between platforms was poor with only a handful of genes out of the several hundred common across all five platforms being consistently identified on all platforms as either up-, down-, or neutrally expressed in the brain compared to the reference sample. Extensive statistical analysis showed that within a platform the normalisation strategy, performed to account for numerous sources of systematic variation (such as labelling efficiency differences between the two fluorescent dyes), had one of the greatest impacts on data comparability.
An example of this can be seen in Figure 2 which shows the distribution of log2 ratios generated from the same array but normalised by two different methods. Repeatability estimates, associated when assessing replicate arrays from the same manufacturer, were large (indicating relatively poor precision). The choice of software for image processing appeared to have the least impact on data.
Public repository
A major goal of the work was to make the vast quantities of unique array data produced in the project available to the scientific community. In achieving this goal nearly 700 text files, containing well over 8 million rows of microarray data, generated from four of the five platforms recently went live via ArrayExpress, one of the leading public repositories in this field managed by the EBI. The fifth dataset will be added this month. This data can be freely accessed by the wider scientific community for further investigation. It is hoped that making these valuable datasets easily accessible will aid microarray validation, and help determine the reproducibility of gene expression profiles between replicate arrays and across arrays produced by different manufacturers.
Toxicogenomics
LGC is now working with the EBI under the second MfB programme to gain a greater understanding of the impact of experimental variables on the conclusions drawn from toxicogenomic data, and to develop a framework for standardisation in order to maximise the potential of the technology. In the area of toxicology the use of genomic approaches (toxicogenomics) using technologies such as microarrays, promises a substantial impact across the entire drug discovery and development pipeline. Currently, toxicity is still the major cause of failure in clinical trials, and as such, there is strong industry interest in improved models for predictive toxicology. Toxicogenomics may help to provide an understanding of the complex pathways of toxicity, may identify biomarkers capable of predicting toxicity early in drug development and may provide biomarkers of toxicity, efficacy and exposure in preclinical and clinical trials.
Building on initiatives instigated in the previous MfB programme to improve confidence in microarray based measurements, work is underway to develop a panel of quality metrics to provide objective performance measurements for validating and standardising toxicogenomic array based experiments. Array data from a model toxicology system using a chemical with a known mode-of-action will be used to develop and validate the quality metrics and this second dataset will also be made publicly available via ArrayExpress. It is anticipated that this second dataset may also be developed into a training tool to allow users to validate data analysis approaches for identifying consistent and reproducible gene expression changes.
Quality and quantity
Other initiatives under the MfB programme to improve confidence in array based measurements and quantification of gene expression include the development of panels of “spike-in” array performance indicators to monitor specificity and efficiency of hybridisation and act as an array QC tool. The development of reference methods and materials to facilitate greater standardisation of gene expression measurements with the ultimate goal of producing a universally accepted unit for quantifying gene expression is a further initiative being undertaken.
Measurements for Biotechnology (MfB) programme
The Measurements for Biotechnology (MfB) programme is one of a portfolio of programmes supporting the development of the UKs National Measurement System (NMS), commissioned by the DTI (Department of Trade and Industry). Launched in 2001, the programme aims to enable better measurements for biotechnology by improving the accuracy, reliability and comparability of biomeasurements and strengthening measurement science in areas and technologies of importance to the UK. Ensuring that the UK biomeasurement system is co-ordinated and developed in harmony with those of other countries is another major aim. Key outputs include measurement advice, good practice guides, standards & validation tools and training materials. For more information on the MfB programme and the projects being undertaken, please visit the website : http://www.mfbprog.org.uk
Other useful links:
http://www.ebi.ac.uk/arrayexpress/
http://www.mged.org/
http://www.lgc.co.uk/
http://www.dti.gov.uk/nms
* Other consortium members: National Physical Laboratory (NPL), Oxford Biomedica Plc, Renovo, UK HGMP Microarray Resource Centre, Queen Mary's School of Medicine and Dentistry University of London (QMUL).
By Dan Hopkins and Carole Foy, BioMolecular Innovation, LGC, Teddington, UK
Corresponding author: Dr Carole Foy, (Toxicogenomics Project Manager), LGC