Open-Source AI-derived drug discovery data to help combat COVID-19
28 Apr 2020
Recursion, a digital biology company industrialising drug discovery, released its open-source RxRx19 dataset; the first human cellular morphological dataset of SARS-CoV-2 (COVID-19). The human cellular morphological data and over 1,600 small molecules has been released to help clinical researchers and machine learning experts around the world who are working to make advances in the fight against the COVID-19 pandemic.
Through RxRx19, researchers in the scientific community will have access to 305,520 5-channel fluorescent microscopy images and corresponding deep learning embeddings to analyse or apply to their own experimentation. Any results and conclusions drawn from the in vitro experiments and targeted hypothesis-driven research will contribute to the growing body of COVID-19 scientific data.
“At Recursion we have repeatedly seen how artificial intelligence coupled with target-agnostic drug discovery can rapidly uncover insights that are obscured through traditional approaches,” said Ben Mabey, chief technology officer at Recursion. “The release of RxRx19 creates an unprecedented opportunity for the machine learning community to uncover those hidden insights that will be most valuable in the fight against a global pandemic. Beyond the immediate purpose, this open-source dataset will help researchers advance in their abilities to use high content imaging for compound efficacy screening, which will have a positive impact that lasts well beyond the resolution of the current crisis.”
The dataset was derived from experiments that Recursion led, in collaboration with Utah State University, to investigate the therapeutic potential of a library of 1,672 Food and Drug Administration and European Medicines Agency-approved or clinical-stage compounds for modulation of the effect of SARS-CoV-2 in human renal cortical epithelial (HRCE) cells. The images were processed using Recursion’s proprietary deep learning neural network to generate high-dimensional featurisations (the process of transforming raw data into features that better represent the underlying problem to the predictive models) of each image for the identification of distinct phenotypic profiles, which are also being shared publicly. The experiments took place over four weeks, start-to-finish, and were conducted at the USU Biosafety Level 3 facility and then analysed by a team of data scientists, engineers and machine learning scientists who are currently working remotely.
Combined with Recursion’s RxRx1 dataset released last year, RxRx19 enables machine learning researchers to leverage modern deep learning techniques to bridge two related datasets that demonstrate completely different biological phenomena but share a consistent image-based approach. Both dataset releases are part of RxRx.ai, a planned series of open-source biological and chemistry data releases for the machine learning community.
To download the free RxRx19 dataset, visit https://rxrx.ai/. For more information on Recursion’s unique approach to applying artificial intelligence and machine learning to drug discovery and development, visit www.recursionpharma.com.