Leveraging the Power of Patient Data in AI Drug Discovery

Written By:
No items found.
Read the post ›

The prevailing wisdom has been that certain diseases – and certain targets – are nearly impossible to go after because the patient population is too low, or the target is not well understood, and there simply is not enough data. But the latest machine learning models trained on cell data can now be combined with patient data to open up a new world of understanding – connecting gene-gene relationships with gene-disease relationships to identify signals in the noise. And thanks to the ability of these models to extrapolate information from limited data, researchers can gain insights from much smaller patient pools.

“Forward and reverse genetics is essentially the bread and butter of why entering patient data is so transformative,” says Hayley Donnella, PhD, Senior Director of Computational Oncology at Recursion. “Forward genetics – using observational real-world data - is incredibly noisy. It’s incomplete and sparse. In the past, we would have to accrue more patient samples over time to see more nuanced signals in association tests. For super rare diseases, patient data hit a ceiling.”

But this patient data is still critical to understanding disease, she says. It directly represents the realities of real patients – in all their messiness and human lived experience.

Hayley Donnella, PhD, Senior Director of Computational Oncology, share insights into how Recursion is leveraging patient data at Download Day.

Reverse genetics – changes to phenotypes in cells – offers a more simplified model of patients. But with modern scientific tools like CRISPR and large scale wet lab robotics it can now be generated in a way that is complete and low noise – encompassing all genes, many replicates, and utilizing extremely careful control of conditions in the laboratory. Machine learning can then combine datasets with both a forward and reverse genetics approach – maximizing the benefits of each approach while overcoming their respective limitations.

For instance, by integrating even limited patient data into Recursion’s Maps of Biology, which rely on massive in-house phenomics data generated in a highly autonomous and standardized way – researchers can derive powerful new insights.

“We use patient data to tell us about disease associations in the Maps of Biology, and the map can tell us a signal that would have been lost in the noise,” Donnella says. “We can get over the law of scale because we have a functional understanding of how genes relate to each other.”

Unlocking Deeper Signals with Patient Data

Recursion has partnered with two companies – Helix and Tempus — to gain access to extensive multimodal, de-identified patient data – approximately 20 petabytes for just Tempus alone – including whole exome and whole genome sequencing. These partnerships encompass hundreds of thousands of patient insights across a wide range of diseases and oncology indications.

Donnella says Recursion takes a unique, holistic approach to this patient data. “The vast majority of companies focus on a single gene or drug program and do analytics around biomarkers and population enrichment,” she says. “We are starting from data on its own and marrying it with our perturbation data in order to initiate the best possible programs found by our models. It’s an entirely different direction.”

By combining forward and reverse genetics, she says, Recursion is unlocking “a bigger, deeper signal” and leveraging the full potential of the data.

Lina Nilsson, PhD, Senior Vice President, Head of Platform, gave early proof of concept into the transformative potential of Helix’s data at a Helix company event in September. “Thanks to the Maps of Biology, running off a small subset of patient data from Helix, we could see signal that others didn’t see until years down the road,” Nilsson said. “We showed that we had overcome data size limitations.”

With the Tempus oncology data, Recursion has already developed numerous causal models, and generated hundreds of insights for lung, renal cell, and other cancers. Two hits have already been identified, and researchers have been able to expand one of these indications to a different patient population. “We’re integrating the whole genome CRISPR map with Tempus to create causal features to submit into validation workflows,” Donnella says. “And now we’re seeing tangible benefits, and it’s already fueling our pipeline.”

Donnella says that patient data can lead to greater insights all across the drug discovery chain – from identifying novel targets, to finding mouse models that match specific profiles, to improving the odds of success in the preclinical stage, to developing biomarker strategies for identifying patient populations for designing clinical trials.

“These first insights are going into the Recursion OS and are building the foundation for the next phase of programs that are all rooted upfront in patient signal,” she says.

Author: Brita Belli, Senior Communications Manager at Recursion.