Shortcut to drug discovery success
24 Nov 2011 by Evoluted New Media
New software approaches can supplement the expertise of medicinal chemists to guide the rigorous exploration of chemistry around hit and lead compounds – and the key is prediction power… Drug discovery teams across the global pharmaceutical industry combine information on potency, selectivity and safety with absorption, distribution, metabolism and elimination (ADME) data in order to design and select compounds with the highest chance of success as drug candidates. Making the right decisions about which compounds to synthesise and test is crucial for the success of drug discovery projects and is a continually evolving process of balancing the different properties required for success (sometimes described as multi-parameter optimisation). The traditional design – synthesise – test – redesign cycle to generate the valuable data required to guide the decision-making is both time-consuming and expensive, limiting the diversity of options that can be explored in the search for high quality compounds that are well balanced in terms of their properties.
In silico predictive models of the key properties of new compound ideas offer a fast and cost-efficient solution for generating data on large numbers of potential drug molecules1. However, most in silico models have a high degree of statistical uncertainty in the values they predict and limitations to the range of chemistries to which they are applicable. This does not preclude these data from providing useful information as multi-parameter optimisation methods have emerged to overcome these shortcomings, enabling users to integrate all key compound data to assess new compound ideas against an optimal property profile2,3. Nevertheless, the efficiency of this approach is still limited by the time and experience required to generate appropriate compound ideas.
[caption id="attachment_25010" align="alignright" width="300" caption="Figure 1: An example scoring profile for a project with the objective of identifying suitable compounds for a serotonin reuptake inhibitor, showing the properties of interest, the desired value ranges and the relative importance of each criterion. For example, the most important property was inhibition of the serotonin transporter, for which a predicted Ki of less than 10 nM (log Ki <1) was required. This was followed by an aqueous solubility of greater than 10 ?M (logS > 1) and a positive prediction for human intestinal absorption."][/caption]
New software platforms are available that can automatically generate new compound ideas using information about a project’s desired property profile to guide the evolutionary process. This dramatically accelerates the entire search process, assessing each idea against a project’s requirements, confidently eliminating lines of enquiry that do not meet these requirements and prioritising the best for detailed consideration by an expert. Using published data about the lead compounds designed during the development of a successful drug we demonstrate the applicability of this new process to a lead optimisation project.
A competent method for generating new compound ideas must be able to support an extended range of chemistries in order to offer many diverse solutions from which to choose the best one. In addition, the method should generate ideas that are relevant and tend towards ‘drug-like’ compounds. A final, and perhaps obvious, requirement is that the method should be flexible enough to be employed by a wide variety of project teams allowing users to customise the process to suit their project goals and guide the evolution of new chemistry ideas as best suits their capabilities and workflow.
Conventional computer-based de-novo design of new compound structures has been performed by ‘growing’ a small fragment known to weakly bind to a biological target or by linking two or more fragments4. The resulting molecules are designed to fit a model of the binding pocket of the target, forming multiple interactions and ideally providing improved binding efficiency. However, early methods often resulted in the generation of molecules that were chemically infeasible or featured limited ‘drug like’ physicochemical and ADME properties. By post-filtering the compounds, these issues could be partly resolved5.
An alternative approach has been introduced that ensures that up to 90-95% of the newly generated compound ideas are drug-like and encode as wide a range of different chemistries as possible. The efficiency of the new method derives from the application of a set of medicinal chemistry transformation rules to an initial ‘parent’ molecule to generate related ‘child’ structures6. Transformation rules can range from simple substitutions or functional group replacements to more dramatic modifications of the molecular framework such as ring opening or closing. In this application, transformation rules have been built around the chemistry expertise seen applied during recent years as the chemical space around lead compounds has been explored.
[caption id="attachment_25012" align="alignleft" width="300" caption="Figure 2: This graph shows the compounds generated by three generations of transformations starting with the lead compound for the project that yielded the drug Duloxetine. Error bars show the uncertainty of the overall score for each compound due to the uncertainties in the underlying data. Only the top 10% of generations 1 and 2 were used as the basis for subsequent generations. The compounds are coloured by generation: Red is the parent, yellow generation 1, light blue generation 2 and dark blue generation 3. The drug Duloxetine was present in generation 3 and is shown by the green diamond."][/caption]
It is not necessary for the transformations to correspond to specific chemical reactions or synthetic routes. Instead, they should serve as a means of describing the typical changes to molecules that would be taken into consideration when implementing an optimisation project. In general, the transformations are relatively feasible moves in chemical space, although in some cases they may require multiple synthetic steps or the synthesis of new building blocks.
The challenge of compound design for a drug discovery project is to create new compound ideas that achieve a specific property profile, which depends on the therapeutic objectives of the project. In order to facilitate and accelerate the process of assessment of new compound ideas, these must be prioritised according to their likelihood of fulfilling the required property requirements. However, prioritisation of compound ideas, when multiple, often conflicting criteria, must be satisfied, is a challenging task. Often various forms of data visualisation are employed, but the efficiency and rigour of this approach is usually limited, due to the high degree of complexity of the property requirements and the uncertainties associated with the assessment of the quality of compound ideas7.
Multi-parameter optimisation offers an alternative solution that overcomes the shortcomings of simple data visualisation. The effectiveness of this method derives from the fact that all key compound data are integrated to perform reliable quality assessment against the specified property profile. In a practical example, described below, the optimal property profile shown in Figure 1 was defined using the probabilistic scoring algorithm of a drug discovery software platform (StarDrop, Optibrium). This enabled many compound ideas to be rapidly prioritised by comparing their predicted properties against a weighted profile of criteria to identify those compounds which were most, and least, likely to succeed. This method also allows for uncertainties in each score to be estimated to highlight statistically significant compounds.
[caption id="attachment_25014" align="alignright" width="300" caption="Figure 3: The initial lead that ultimately gave rise to Duloxetine, the top three compounds generated from this lead, and Duloxetine, which was also generated by the algorithm. The score for each compound is show to the right along with a histogram indicating the contribution of each property to the overall score. All of these compounds are predicted to have good values for the predicted ADME properties. However, the initial lead has a much lower score due to a significantly poorer Ki predicted for the serotonin transporter. The structure and calculated score for Litoxetine a clinical candidate serotonin reuptake inhibitor is show to the right for comparison. The predicted Ki for this compound is 10 nM, in line with the reported IC50 of 6 nM. Although this structure was not generated automatically in this example, it bears a strong similarity (Tanimoto similarity >0.9) with the second-ranked compound, which has a higher predicted affinity and hence a higher score."][/caption]
In this example, based on the published lead compound that resulted in the drug Duloxetine (see Figure 3), we illustrate how an initial lead can be evolved using a general set of transformations and guided by a profile of properties, resulting in the rapid identification of high quality drug-like compounds. Applying a set of 206 transformations to the lead compound produced 172 ‘child’ compounds. Within this automatic framework the next generation of compounds is generated by applying all the transformations successively to each of the child compounds. The implication of this exhaustive application of transformations is that, after three generations, approximately 1.7 million child compounds would be created. To manage this exponential growth, all the new compounds were scored against the profile of predicted properties shown in Figure 1 and only the top 10% were used as the basis for subsequent generations. This constrained the evolution to only those compounds with a good chance of success and the final data set contained 2,208 compounds out of the potential ~1.7m.
The scores for the 2,208 compounds were generated using predictions from QSAR models of inhibition of the serotonin transporter and key ADME properties (Figure 2). It was noted that the scores increased from one generation to the other, indicating a continuous improvement in the quality of the compounds. However, as the scores were calculated by combining the results from multiple uncertain predictions, this resulted in significant uncertainties illustrated by the error bars in Figure 2. This made it difficult to distinguish between compounds, particularly in the final generation. Notably, the drug molecule Duloxetine was present in the final generation, with a score that was significantly higher than the initial lead and not significantly below that of the highest scoring compounds.
Figure 3 shows the structures and scores of the initial lead and Duloxetine as well as the three highest ranking molecules generated. Although none of the top three compounds could be identified in a search of PubChem8, the second-ranked compound is very similar to Litoxetine, which was progressed to clinical trials and is active against the serotonin transporter with an IC50 of 6 nM 9.
[caption id="attachment_25016" align="alignleft" width="300" caption="Figure 4 The chemical space of compounds generated from the initial lead that gave rise to Duloxetine. The points corresponding to compounds are coloured by score, from the lowest (0.29) in red to the highest (0.69) in yellow. The initial lead is shown as a dark blue diamond, Duloxetine as a light blue diamond. The top-three scoring compounds are shown as green diamonds."][/caption]
The chemical space of the data set generated revealed multiple ‘hot spots’ containing high-scoring compounds (Figure 4). The top three ranked molecules are structurally diverse, within the range of diversity explored around the initial lead, and are distinct from both the initial lead and Duloxetine. This finding indicates that the computational approach identified a diverse range of chemical strategies that should be considered further.
Predictive models in drug discovery research programmes typically have a high degree of statistical uncertainty, making it difficult to decide with confidence which compounds to prioritise for synthesis an experimental testing. However, by integrating data from predictive models and scoring this information against specific project goals within a multi-parameter optimisation framework, statistically significant differences can be accurately highlighted, creating a solid foundation and consistent benchmark for compound analysis and selection. Combining this with a platform capable of automatically applying chemistry rules, encoded as potential molecular transformations, it becomes possible to quickly and rigorously explore the chemistry around hit or lead compounds to stimulate the search for high quality compounds, while maintaining an appropriate view on the ultimate balance of properties necessary for success.
References
- Van de Waterbeemd H, Gifford E. ADMET in silico modelling: towards prediction paradise? Nat. Rev. Drug Discovery. 2003;2:192-204.
- Ekins S, Boulanger B, Swaan P, Hupcey M. Towards a new age of virtual ADME/TOX and multidimensional drug discovery. J. Comp. Aided Mol. Design. 2001;16:381-401.
- Segall M, Champness E, Obrezanova O, Leeding C. Beyond Profiling: Using ADMET models to guide decisions. Chemistry & Biodiversity. 2009;6:2144-2151.
- Schneider G, Fechner U. Computer-based de novo design of drug-like molecules. Nature Reviews Drug Discovery. 2005;4(8):649-663.
- Hartenfeller M, Schneider G. Enabling future drug discovery by de novo design. Wiley Interdisciplinary Reviews: Computational Molecular Science. 2011.
- Stewart KD, Shiroda M, James CA . Drug Guru: a computer software program for drug design using medicinal chemistry rules. Bioorg Med Chem. 2006 Oct 15;14(20):7011-22.
- Segall M, E C. The Difference between Guiding and Supporting Decisions: Enhancing Decisions and Improving Success in Drug Discovery. Genetic Engineering News. 2010 September.
- Bolton E, Wang Y, Thiessen P, Bryant S. PubChem: Integrated Platform of Small Molecules and Biological Activities. In: Annual Reports in Computational Chemistry. Vol 4. Washington DC: American Chemical Society; 2008. p. 217-241.
- Andrews M, Brown A, Chiva J, Fradet D, Gordon D, Lansdell M, MacKenny M. Design and optimisation of selective serotonin re-uptake inhibitors with high synthetic accessibility: part 2. Bioorg. Med. Chem. Lett. 2009;19:5893-5897.