Collaborate to accumulate
3 Jul 2017 by Evoluted New Media
Pharmaceutical research and development has historically been shrouded in mystery, a secretive activity conducted behind closed doors to protect commercial advantage. But, as big data continues to transform the industry must we remain so reluctant to share data? Katharine Briggs looks at the benefits, challenges and considerations surrounding the sharing of proprietary data
Pharmaceutical research and development has historically been shrouded in mystery, a secretive activity conducted behind closed doors to protect commercial advantage. But, as big data continues to transform the industry must we remain so reluctant to share data? Katharine Briggs looks at the benefits, challenges and considerations surrounding the sharing of proprietary data
We know that one of the challenges in medical research is the scarcity of real-world data available to academic researchers and other interested parties to develop new and improved drugs.
According to a study conducted by Forbes, the average pharmaceutical company spends $350 million to get a single drug to market
According to a study conducted by Forbes, the average pharmaceutical company spends $350 million to get a single drug to market. A large proportion of that cost is spent on the research and discovery of new compounds, and the lengthy biological and chemical testing of their properties in the laboratory – both in vitro and in vivo. Consequently, every pharmaceutical company is sitting on a goldmine of big data, the analysis of which could significantly reduce the product development lifecycle, and yet there remains a reluctance to collaborate.
Data sharing does happen in the pharmaceutical industry, but it is not yet standard practice and remains the preserve of special projects. One such example is the ChEMBL database. Hosted by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), ChEMBL is a vast online database containing bioactivity data on more than 1.6 million drugs and drug-like small molecules and their targets. Originally developed as a private resource by a biotechnology firm, it was acquired by EMBL in 2008 and has become a valued public resource for virtual screening, drug design and product development.
Share and share alike
ChEMBL is utilised by academics and industries of all sizes, strengthening innovation from new research, and the discovery of new treatments and drugs benefiting human health and agriculture. In the Strategic Vision for UK e-infrastructure report, Professor Dominic Tildesley of Unilever identified the ChEMBL database as a crucial part of the company’s development of antiperspirants. Unilever used the database to identify active components for antiperspirants and the ChEMBL data to build a model of their inhibition activity. Similarly, chemists from agrochemicals business Syngenta use ChEMBL in their product development. Mark Forster from Syngenta says of the database: “ChEMBL has links between both chemistry and biology data which makes it searchable in ways that the underlying literature would not be. People at the EMBL-EBI do a fantastic job in making a vast amount of data of different types openly available to researchers, and without the EMBL-EBI resources in general I’m sure life science research would be greatly hindered.”
[caption id="attachment_61227" align="alignnone" width="620"] Collaboration between pharmaceutical companies could see breakthroughs in drug discovery.[/caption]
Ethical imperative
Increased collaboration and dissemination of data is not only in the interest of public health, but is also increasingly required by funding organisations and is a vital part of achieving a reduction in animal testing. Aside from the ethical benefits, a reduction in animal testing also delivers other savings in terms of time and money, plus the data and knowledge gained in sharing data could enable more informed decisions about what substances to test and what tests to perform. An initiative led by the NC3Rs and the MHRA involving 32 organisations sharing data for 137 compounds and 259 studies, identified that the use of recovery animals could be reduced by up to 66%, saving thousands of animals globally each year.
Regulators recognise that animal testing needs to be kept to a minimum whilst still protecting man and the environment
Regulators recognise that animal testing needs to be kept to a minimum whilst still protecting man and the environment. A fundamental aspect of the European Union registration, evaluation, authorisation and restriction of chemicals (REACH) regulation is the requirement to share data from studies involving vertebrate animal testing through Substance Information Exchange Fora (SIEFs) to avoid unnecessary duplication of tests. Meanwhile, in cosmetics, The Cosmetics Regulation prohibits the use of animal testing of products marketed in the EU and their ingredients, but also requires data on toxicological properties to be gathered as part of the product information file. In this context, data collaboration is vital to avoid stagnation in innovation.
Independent replication of research findings is seen as the fundamental mechanism by which scientific evidence accumulates to support a hypothesis
A case for data sharing can also be made on the basis of the ethos of science described by Robert Merton which states that scientific findings should be made available to the entire scientific community to allow other researchers to conduct their own analyses and verify the results. Independent replication of research findings is seen as the fundamental mechanism by which scientific evidence accumulates to support a hypothesis. The field of genomics is regarded as a leader in the development of infrastructure, resources and policies that promote data sharing and this is cited as one of the main reasons for the rapid advance in genetic research compared to other areas of biomedicine.
Don’t be left out
A key obstacle to data collaboration is the perceived need within industry to protect proprietary information. However, organisations need to be clear about how much of a competitive advantage they will lose by sharing data versus the knowledge they will gain. How unique is the knowledge they hold versus the knowledge their competitors could bring to the table? Consideration should also be given to the risk of not taking part in data sharing, as those organisations that participate will have a competitive and economic advantage over those who do not.
Frustratingly, big data in pharma is often ‘locked’ inside pdfs sitting in individual company archives where it is unavailable even for internal analysis, so companies are often ‘protecting’ data they aren’t actually able to use themselves. Providing access to a larger pool of data can reveal patterns that are simply not visible in smaller component datasets where such relationships may be represented by only one or two chemicals.
Research data can be valuable many years after it has been generated and fresh eyes can reveal new insights beyond those originally identified
It is often the case that only regulatory bodies have ready access to pooled datasets from multiple companies and therefore the opportunity to identify these broader patterns by performing cross-company analyses. This can present problems when pharmaceutical businesses submit a new drug application as broader regulatory knowledge can lead to challenges and assertions that need to be addressed, resulting in delays and the need for additional data generation for the pharmaceutical company. Research data can be valuable many years after it has been generated and fresh eyes can reveal new insights beyond those originally identified. In addition, new research topics and fields are emerging between the boundaries of traditional disciplines. By sharing data, companies can gain from external expertise in the same or different fields, opening up the data to be explored and used in ways which may not have originally been envisioned.
Academics, small biotechs, SMEs (small and medium-sized enterprises) and contractors can be included as collaborators, broadening the skills and experience still further and creating relationships which can be built on in the future. There is also an opportunity to improve data quality, as providing access to other experts will help identify errors and inconsistencies, similar to the crowdsourcing model used by Chemspider. As the costs of generating the data are also shared, it opens up the possibility for exploratory research that otherwise might not be commercially viable.
Big data
Maximising the accessibility of data will become increasingly important as in silico systems move towards the prediction of more complex phenomena for which datasets of an appropriate size, quality and coverage are limited. In a survey by the Publishing Research Consortium in 2010, access to ‘datasets, data models, algorithms and programs’ was ranked as important or highly important by 62% of the 3823 respondents, whereas only 38% graded these as very or fairly easy to access. Driven by the increased recognition of the importance of in silico systems, the eTOX consortium was a seven-year public-private partnership within the framework of the European Innovative Medicines Initiative. The project aimed to develop innovative in silico strategies and novel software tools to better predict the toxicological profiles of small molecules in the early stages of the drug development pipeline.
[caption id="attachment_61228" align="alignnone" width="620"] Revisiting old data could prove lucrative for new co-collaborators[/caption]
The backbone of the project was a database hosted and curated by Lhasa Limited, who acted as the honest broker for the project. The database consisted of pre-clinical toxicity data for drug compounds or candidates, extracted from previously unpublished, legacy reports from 13 European pharmaceutical companies. The database was enhanced by the incorporation of publically available, high-quality toxicology data, which was being collected by the European Bioinformatics Institute and also incorporates the RepDose database donated by Fraunhofer. Early eTOX use cases included the investigation of the relevance of specific histopathology findings (confirmed to be target related and species specific), identification of potential target related effects (leading to inclusion of specific target organs in early in vivo studies), and the implementation of a framework of four key approaches (similarity of structure, pharmacology or adverse effects and use of in silico prediction) as part of an early small molecule drug development pipeline.
The eTOX project has now ended but its legacy has led to the formation eTOXsys, a software solution that can deliver improved early drug candidate safety assessment through access to proprietary toxicology data and predictive models.
Pharma karma
So how can pharmaceutical businesses overcome the challenges and concerns relating to data collaboration in order to reap the rewards of projects such as eTOX? Regulations to protect the privacy of personal health information are often seen as potential barriers to data sharing due to the risk of accidental, malicious or compelled disclosure. However, data can still be shared as long as privacy safeguards are in place. Redacting data to strip out individual identifiers, statistically altering data in ways which do not compromise secondary analysis and placing restrictions on access to data are all simple steps that can be taken to secure it.
A survey of 1329 scientists suggested that another concern amongst the pharmaceutical community was the idea that data could be misused. However, creating an End User License where users are required to agree to certain conditions of use, including specific authorisation requirements from the data owner and limiting access to certain users are measures that can easily be put in place to mitigate risk. Data being stored in disparate repositories, in different formats and using potentially incompatible data types presents another significant technical challenge but not one that is unsurmountable. However, the additional resource needed to convert the data to an agreed format will add to the costs of data sharing. It also makes sense to opt for platform-independent file formats for exporting and importing data such as XML (extensible markup language), CSV (comma separated value) or SDF (structure data file), which can be opened using several software applications. However, using the same format for exporting and importing data does not avoid differences in what data are captured or how those data are captured e.g. as a number, text, etc. Here, data standards such as SEND can ensure that the data being captured are compatible.
Quantitative data should ideally be captured using standardised units to simplify data mining and analysis
A controlled vocabulary is preferred when capturing qualitative data in order to avoid problems due to differences in spelling and terminology. The use of ontologies offers additional benefits in that the relationships – synonyms, meronyms/homonym, hyponyms/hypernyms – between terms can also be captured. Ontologies were developed as part of Lhasa’s eTOX data sharing project in order to help with cross-study data analysis where pathology findings could be reported as different levels of granularity e.g. gastrointestinal tract vs colon. Quantitative data should ideally be captured using standardised units to simplify data mining and analysis. However, this is not always practical as recalculation of values can lead to an increase in the number of errors introduced during data entry. When designing the schema, an assessment also needs to be made as to whether precise figures will always be given, or if greater than/less than values and number ranges also need to be captured.
Honest broker
Pharmaceutical companies vary in whether they consider data on marketed drugs to be sensitive data. Sensitivity of data can also change as a result of the repurposing of drugs and drug candidates. One of the eTOX project participants was able to elaborate a procedure for obtaining general permission for full or restricted sharing, dependent on the status of the compound i.e. whether it was marketed, terminated, under current development (excluding new formulations, new indications or combinations of marketed drugs) or subject to product liability claims.
Responsibility for deciding if data can be shared is often delegated to legal and IP departments. The disadvantage of this is that they only see the risks and not the benefits of data sharing and, being risk adverse, say no by default. In addition, the utility of the data can be difficult to demonstrate ahead of the data being donated. The eTOX project participants highlighted the need for a summary about the project which could be shared with upper management and departments involved in granting authorisation in order to increase publicity and to facilitate decision-making.
In the case of confidential data, an honest broker can be utilised in order to protect the security of sensitive data. This organisation needs to be trusted by all partners as they will have access to all the data and be responsible for controlling access for the other partners. A not-for-profit or academic organisation is likely to be preferred over a commercial one for this reason.
Evolution of sharing
Over the past decade, data sharing within the pharmaceutical industry has evolved from being virtually non-existent to a landscape where most companies will have gained experience through one or more initiatives. However, for the pharmaceutical sector to truly benefit, data collaboration needs to be incorporated into business as usual, rather than remaining the preserve of special projects. Data still exists within silos and the people who could do something useful with that data often don’t have access to it. There remains a fear in the sector that sharing data gives away commercial advantages when, in fact, sharing information could significantly reduce overheads and speed up the development of new drugs. With the rising cost of clinical trials and health data, the industry needs to look at collaboration as the way forward. Sharing data is not without its challenges, but with the right partners, the benefits far outweigh the risks.
Author: Katharine Briggs is Research Leader at Lhasa Limited, a not-for-profit organisation and educational charity that facilitates collaborative data sharing projects in the pharmaceutical, cosmetics and chemistry-related industries.