How to avoid information overload
12 Feb 2008 by Evoluted New Media
The amount of data that biological researchers have access to, and indeed need access to when answering the questions they are asking, is ever increasing. One way to prevent information overload is to turn to Network Informatics says Annette Adler
The amount of data that biological researchers have access to, and indeed need access to when answering the questions they are asking, is ever increasing. One way to prevent information overload is to turn to Network Informatics says Annette Adler
As advances in bio-analytical instruments provide scientists with increasing amounts of information, the network analysis functionality available through the open source software Cytoscape is helping biologists analyse large sets of heterogeneous data to achieve insights into the biological mechanisms and events under investigation.
Scientists are asking increasingly complex biological questions as never before. In previous decades it was possible to spend an entire career studying a single molecule. But now, however, a single experiment can generate thousands of data points because of the technology advances in biological measurement platforms. Specialised RNAi microarrays and new proteomic approaches that identify and quantify proteins generate vast amounts of data for scientists to analyse and screen.
Instead of a reductionist approach, scientists are required to take a more holistic, systems-based approach as they seek, for example, to understand the factors involved in a disease in order to develop appropriate diagnostics and therapeutics. This is systems medicine based on systems biology. It requires new types of network analysis tools that provide a meaningful biological context for vast amounts of data by assembling biological data into genetic and functional networks of cause-and-effect interactions, such as those between proteins and genes.
Figure 1: Cytoscape screen shot |
Cytoscape enables researchers to create visual maps of complex biological networks, increasing their understanding of molecular pathways and the biological causes of disease. Cytoscape allows users to integrate, visualise and query biology networks to derive computational models; and to view, manipulate and analyse their data to reach biological insight. Cytoscape is becoming the state-of-the-art and standard tool for representing biological networks.
Cytoscape represents genes or proteins as nodes in a network. The connections between nodes represent types of interactions between these, such as signaling or regulatory interactions, models of signaling pathways, protein complexes, cell structural components, regulatory circuitry and other cellular machinery.
The software allows scientists to integrate data of different types together into a single network system and then filter and interpret it for different conditions such as disease or time courses. Scientists can study thousands of molecular interactions from one experiment and combine this data with results of many other experiments to get a more holistic picture of what’s going on. Because of Cytoscape’s ability to query their data, scientists can focus on the whole picture or on a particular question of interest.
Cytoscape was one of the first tools for visualisation of protein networks. Started in 2001 at the Institute for Systems Biology in Seattle, Washington it has grown to become the standard tool in academia and industry for biological network analysis. Its open-source model encourages many third party developers and industrial partners worldwide to participate and innovate.
In 2005 the nonprofit Cytoscape Consortium was formed to oversee the collaborative development and evolution of Cytoscape. The consortium’s seven members include the University of California San Diego, the Institute for Systems Biology, Memorial Sloan-Kettering Cancer Center, Institut Pasteur, University of California, San Francisco, Unilever, and Agilent Technologies.
In addition, Cytoscape’s scientific advisory board includes prominent scientists who work with large-scale interaction data - including Marc Vidal of Dana Farber, Joel Bader of Johns Hopkins University, Manuel Peitsch of Novartis, Ilya Shmulevich of the Institute of Systems Biology, David States of the University of Michigan and Nevan Krogan of University of California, San Francisco - as well as leaders of other major bioinformatics databases and resources, such as Ewan Birney of the European Bioinformatics Institute. Cytoscape’s active user and developer community participates through its website (www.cytoscape.org), online help system and associated Google discussion groups. Cytoscape has gained tremendous range with approximately 2,000 downloads per month, more than 40,000 downloads since its inception and more than 35 plugins.
Cytoscape is well represented in the scientific literature. A recent publication in Nature Protocols1 provides a full tutorial, covering a specific workflow for expression analysis and visualisation. This tutorial is an easy way for new users to get started with Cytoscape (see figure 1).
In November, the Cytoscape Consortium hosted its 5th Annual Symposium in Amsterdam, the first in Europe. Speakers highlighted the strengths of network informatics and the ways it is helping scientists address the complexities of biology. Topics included neural networks in systems biology, use of networks to find biomarkers in cancer, personalised medicine, protein-centered networks in systems biology and information visualisation to represent biological networks more effectively.
Figure 2: Network of associations extracted from the scientific literature on cardio-vascular disease |
Scientists at Agilent Laboratories are enthusiastic users and supporters of Cytoscape. Agilent Laboratories is a world-leading industrial research center whose purpose is to power the growth of Agilent Technologies through breakthrough technologies.
Before we knew about Cytoscape, we faced the problem of integrating diverse biological data to answer biological questions. Although we built our own research tool in 2002 – 2003, we were limited to the scope of solutions we could build ourselves and appreciated potential barriers of a proprietary tool for collaboration with external partners. In addition, Agilent as a company has always supported emerging industry standards. We needed to partner with other groups for needed technology.
When we found Cytoscape, we realised it addressed most of the capability we needed, and in fact, we could contribute our research developments to Cytoscape and support this emerging standard in the process. In Cytoscape 2.0, we saw a wealth of functionality, both in the core and in plugins. Joining forces with Cytoscape enabled us to leverage our work and take on more ambitious collaborations with external partners.
We attended the second annual retreat in 2004 and joined the Consortium when it formed in 2005. We have used Cytoscape in our own research with external collaborators ever since.
Cytoscape is a significant resource of Agilent, a measurement company. Cytoscape increases the effectiveness of bio-analytical measurement solutions by providing customers a way to gain more understanding of their data. Cytoscape enables them to gain biological insight faster and speed their research.
An example of how Agilent achieves biological insight from Cytoscape involved the Agilent Literature Search plugin in a study of cardiovascular disease with Stanford University Medical School2. We analysed gene expression with microarrays to identify genes that were differentially regulated under different conditions, for example diabetic vs. non-diabetic patients. We then used the plugin to search the scientific literature for associations between these differentiating genes and other genes/proteins. Figure 2 shows a network of associations extracted from the scientific literature on cardio-vascular disease, annotated with gene expression data via node coloring and a ‘heatstrip’ visualisation we developed as a Cytoscape plugin. Nodes are colored from a gradient of bright red (up-regulated) to yellow (neutral) to bright green (down-regulated). Beneath each node is a ‘heatstrip’ corresponding to the variation of gene expression of a particular gene over a set of experiments.
The blue and brown bars represent gene expression under two experimental conditions. The height and direction of the bars represent the magnitude of the measurement and whether it is positive or negative. This format allows us to simultaneously visualise multiple interactions, significance levels of gene expression changes, and raw data. Key genes can be identified and their connections explored. We have labeled such genes ‘nexus’ genes to indicate their fundamental role within a generated network. ‘Nexus’ genes may be potential targets for therapeutic treatment. ‘Nexus’ genes may themselves be only mildly regulated, but are connected to many strongly regulated genes. Thus, these potential targets would not have been identified if the scientist had relied upon gene expression analysis alone.
Use of network informatics tools such as Cytoscape will continue to become increasingly widespread. As biologists around the world gather more and more experimental data and collaborate with other scientists, they will require informatics tools to speed analysis, insight and new understanding.
References
1. Cline, M. et al. Integration of Biological Networks and Gene Expression Data using Cytoscape. Nature Protocols 2, 2366-2382 (01 Oct 2007)
2. King, J.Y. et al. Pathway analysis of coronary atherosclerosis. Physiol Genomics 23, 103-18 (2005)
By Annette Adler. She is the systems biology program manager at Agilent Laboratories, the central research organisation of Agilent Technologies.