New Data’s Not Enough. How Recursion Is Integrating Data Layers to Advance End-to-End Drug Discovery

Written By:
No items found.
Read the post ›

There are few industries as data-rich as pharma and biotech. But when it comes to AI drug discovery, the quality of the data matters – and the interconnectedness. Outdated, inaccessible, and undocumented data stores make it challenging to extract the insights needed to find viable new disease targets, and make it harder to precision-design drugs for those targets while minimizing toxic side effects. And they can’t match those drugs to the patients most likely to benefit.

We need connected, high-quality biological data layers – from proteins all the way to patients – in order to derive new, actionable insights that can lead to new medicines in rare diseases, aggressive cancers, and neurodegenerative diseases.

Article content
Inside Recursion's automated wet lab.

At Recursion, we pair our proprietary data with purpose-built computational models and scaled compute infrastructure. Our wet laboratories generate large volumes of standardized, high-quality biological and chemical data, while our dry-lab capabilities apply machine learning, physics-based modeling, and statistical inference to extract actionable insights from those data. Today, we systematically generate, integrate, and analyze data across the full R&D value chain, from patient data and disease biology to molecular design and clinical execution.

Connecting Data Layers with Foundation Models

“Our massive phenomics data layer is a huge advantage”, says Peter McLean, Director of Data Science and Applied ML at Recursion, “but it’s not enough.” In order for us to make sense of what’s happening inside cells and tissues, we need to draw insights from multiple data layers, including transcriptomics which reports on levels of gene expression; chemical and molecular data that shows how a drug will be absorbed, tolerated, and excreted; and patient data that connects genomic and health information.

“A lot of people are focused on solving a single problem,” says McLean. “Generating a dataset to train one model that is a better predictor of your favorite molecular property is great – but it’s a drop in the bucket for drug discovery.”

Article content
Recursion's labs capture multiple data layers that are integrated through machine learning foundation models.

Recursion, he says, has a full-stack platform with a high degree of integration between data layers thanks to its investments in large foundation models like the Phenom family of cell image models; Boltz-2 developed with MIT for better protein binding affinity prediction; and, most recently, a best-in-class transcriptomics foundation model that delivers state-of-the-art representations of transcriptional profiles for empirically-grounded target discovery that bridges in vitro assays with patient data.

“Our big foundation models are trained to give us really good, relatable representations of high-dimensional assay data – to put raw biological and chemical data into a navigable, quantitative space,” says McLean. Now, he says, Recursion is focused on going one step further – using machine learning and agentic tools to integrate those data layers.

Using Integrated Data and Models to Advance a Novel Rare Disease Drug

Because Recursion has a wholly-owned pipeline of clinical stage drugs, along with multiple pharma partnerships delivering on milestones, we can point to exactly how our data collection, data integration, and foundation models have led to breakthroughs in developing differentiated medicines for patients and better fundamental understanding of challenging diseases.

In December 2025, we announced the first clinical validation for our AI-enabled Recursion OS platform in our REC-4881 program – a potential first-in-disease treatment for the rare, progressive, genetic disease familial adenomatous polyposis (FAP). Running phenotypic images of diseased cells against thousands of compounds the platform made a novel discovery – REC-4881, a MEK1/2 inhibitor, was capable of reversing the cells from diseased to healthy. We in-licensed the drug and redirected it toward FAP patients – of which there are approximately 50,000 in the U.S. and E.U. who suffer from the relentless growth of polyps and tumors in the colon that can currently only be treated by surgeries and have a 100% chance of turning to colon cancer if left untreated.

Additional data layers beyond phenomics have allowed us to both advance and expand the ongoing TUPELO trial for REC-4881. We used real-world data from 1,000 US FAP patients (including 250,000 physician notes) and from a world-leading FAP registry in collaboration with the Amsterdam University Medical Center with our custom LLM-based pipeline to demonstrate the natural history of FAP – reinforcing its progressive nature, highlighting the absence of spontaneous polyp regression, and demonstrating the substantial burden of repeated polyp-removal procedures and major surgeries experienced by patients. ClinTech insights also allowed us to refine the design of the trial, expanding age eligibility from ≥55 to ≥18.

“This is a clear story of our full-stack platform delivering,” says McLean. “Insights from our platform led to a novel treatment for a rare disease and allowed us to better contextualize the natural history of the disease to accelerate and expand our trial.”

© 2025 Recursion. All rights reserved.