Recursion OS: The Heart of 

Our Operations

Interested in our public data sets and models? Visit RxRx.ai.

The Journey of an Experiment

Generating one of the largest relatable data sets in pharma is only possible with Recursion’s automated high throughput labs. We’ve generated ~36 PB of proprietary data across phenomics, transcriptomics, proteomics, ADME, and InVivomics. Click through to follow a single experiment from the start of the process to the end.

Our labs can do up to 2.2 million samples per week.

The Journey of an Experiment

Central to our mission is the Recursion Operating System (OS), a platform powered by one of the world’s largest proprietary biological and chemical datasets. Instead of looking narrowly at a handful of diseases with existing therapeutic hypotheses, we build Maps of Biology and Chemistry that broaden our search and allow us to explore unknown areas of disease biology.

Our labs can do up to 2.2 million samples per week.

Tissue Culture

First, cells are cultured in large batches in our Tissue Culture labs. To focus on scalability, we’ve developed innovative methods for growing, freezing, thawing, and experimenting with these cells in large quantities. Our first maps are in HUVEC and NGN2 neurons; we’ve produced hundreds of billions of each.

We’re probably one of the largest, if not the largest, producers of HUVEC cells in the world. We can create over 100 billion cells per year for our high throughput experiments.

Tissue Culture

First, cells are cultured in large batches in our Tissue Culture labs. To focus on scalability, we’ve developed innovative methods for growing, freezing, thawing, and experimenting with these cells in large quantities. Our first maps are in HUVEC and NGN2 neurons; we’ve produced hundreds of billions of each.

We’re probably one of the largest, if not the largest, producers of HUVEC cells in the world. We can create over 100 billion cells per year for our high throughput experiments.

CRISPR at scale

In order to run experiments, we need to intervene in the cell (i.e. perturb it) in order to mimic diseases or to test treatments. The primary way we model diseases on our platform is by knocking out a gene’s function with CRISPR-Cas9 editing. We systematically combine the cells with the programmed CRISPR guide set for each gene, one at a time. We also introduce slight variations in the guides for each gene and we do many instances or replicates of each combination so that we create a more robust experimental signal for each gene.

CRISPR At Scale

In order to run experiments, we need to intervene in the cell (i.e. perturb it) in order to mimic diseases or to test treatments. The primary way we model diseases on our platform is by knocking out a gene’s function with CRISPR-Cas9 editing. We systematically combine the cells with the programmed CRISPR guide set for each gene, one at a time. We also introduce slight variations in the guides for each gene and we do many instances or replicates of each combination so that we create a more robust experimental signal for each gene.

Cell Seeding

Cells are seeded into experiment plates. These are essentially grids of miniature test tubes, with each plate containing 1536 miniature test tubes called ‘wells’. Each well will ultimately contain a unique experimental condition with some unique combination of cells and a reagent or condition that we are testing in that well.

Cell Seeding

Cells are seeded into experiment plates. These are essentially grids of miniature test tubes, with each plate containing 1536 miniature test tubes called ‘wells’. Each well will ultimately contain a unique experimental condition with some unique combination of cells and a reagent or condition that we are testing in that well.

Compound management

This automated process is repeated for thousands of plates. In different runs, our platform screens whole genome CRISPR knockout, or we profile millions of compounds in the same manner to compare phenotypes and transcriptomes.

All of the reagents for each experiment are stored in an automated storage system that directly integrates with the liquid transfer and plate automation work cells.

Automated Compound Management

This automated process is repeated for thousands of plates. In different runs, our platform screens whole genome CRISPR knockout, or we profile millions of compounds in the same manner to compare phenotypes and transcriptomes.

All of the reagents for each experiment are stored in an automated storage system that directly integrates with the liquid transfer and plate automation work cells.

Incubation

The plates move through various stages of the assay while the cells take on the effects of the perturbation, getting washed, new cell food or ‘media’ is added, the plates are incubated. This all takes a few days.

Incubation

The plates move through various stages of the assay while the cells take on the effects of the perturbation, getting washed, new cell food or ‘media’ is added, the plates are incubated. This all takes a few days.

Cell Imaging

Then high content microscopes take pictures of each well. This can be done at different points along the assay, giving us a longitudinal signal for how the cells change over time as the reagent coexists in the cell environment. We use Brightfield imaging which gives us the most information about the experiment over time.

Our phenomics images come off of our platform at a rate of millions per week.

Cell Imaging

Then high content microscopes take pictures of each well. This can be done at different points along the assay, giving us a longitudinal signal for how the cells change over time as the reagent coexists in the cell environment. We use Brightfield imaging which gives us the most information about the experiment over time.

Our phenomics images come off of our platform at a rate of millions per week.

Trekseq

After imaging, plates are transferred to our next high throughput platform: Transcriptomics, which give us an understanding of gene expression. The plates are processed again with barcodes that bind to each mRNA, giving it a unique identifier. The transcriptomics barcodes can then be ‘read’ by an instrument called a sequencer, resulting in a large data file that is representative of the well’s unique transcriptome

We are one of the largest transcriptomics data producers in the world.

Trekseq

After imaging, plates are transferred to our next high throughput platform: Transcriptomics, which give us an understanding of gene expression. The plates are processed again with barcodes that bind to each mRNA, giving it a unique identifier. The transcriptomics barcodes can then be ‘read’ by an instrument called a sequencer, resulting in a large data file that is representative of the well’s unique transcriptome

We are one of the largest transcriptomics data producers in the world.

Deep learning & AI

All data from transcriptomics and phenomics are analyzed through a series of mapping transformations where they are embedded by our AI models into a mathematical space, allowing us to calculate metrics about and between each perturbation.

Deep Learning & AI

All data from transcriptomics and phenomics are analyzed through a series of mapping transformations where they are embedded by our AI models into a mathematical space, allowing us to calculate metrics about and between each perturbation.

Industrialized Workflows

Together these data are used to trigger actions in our industrialized workflows. They build our maps of biology and in those maps we discover relationships which we can further test in our labs.

Industrialized Workflows

Central to our mission is the Recursion Operating System (OS), a platform powered by one of the world’s largest proprietary biological and chemical datasets. Instead of looking narrowly at a handful of diseases with existing therapeutic hypotheses, we build Maps of Biology and Chemistry that broaden our search and allow us to explore unknown areas of disease biology.

Design

From the hits identified, synthesis-aware generative AI workflows are used to design optimized drug candidates to meet target candidate profiles and remove likely liabilities. Models are used to score and rank potential molecules, with active learning used to select panels of the most informative compounds to make.

Design

From the hits identified, synthesis-aware generative AI workflows are used to design optimized drug candidates to meet target candidate profiles and remove likely liabilities. Models are used to score and rank potential molecules, with active learning used to select panels of the most informative compounds to make.

Make

After hits are identified by our industrialized workflows, we select only the very best insights to pursue as potential programs. The molecule is optimized, tested for safety, and developed further into a potential medicine on our translation platform.

Make

After hits are identified by our industrialized workflows, we select only the very best insights to pursue as potential programs. The molecule is optimized, tested for safety, and developed further into a potential medicine on our translation platform.

Test

Next, compounds are tested using our automated biology assays. Molecules automatically pass through panels of increasingly more complex assays to confirm activity against their target and in a range of in vitro and cell-based assays. Multiple other parameters covering ADME, PK, and toxicity are also assessed.

Automating these processes reduces manual handling of experiments, decreasing the time and cost of the majority of biological assays by >75%.

Test

Next, compounds are tested using our automated biology assays. Molecules automatically pass through panels of increasingly more complex assays to confirm activity against their target and in a range of in vitro and cell-based assays. Multiple other parameters covering ADME, PK, and toxicity are also assessed.

Automating these processes reduces manual handling of experiments, decreasing the time and cost of the majority of biological assays by >75%.

Learn

Experimental values obtained from the testing of molecules are automatically fed back into our models to improve them. Iterative design cycles evolve molecules towards our goal, as we learn our way through the project. A final molecule is optimized and designed further into a potential medicine on our translation platform.

Learn

Experimental values obtained from the testing of molecules are automatically fed back into our models to improve them. Iterative design cycles evolve molecules towards our goal, as we learn our way through the project. A final molecule is optimized and designed further into a potential medicine on our translation platform.

Industry-leading data stack

Along with producing leading datasets in phenomics and transcriptomics, we’ve added proteomics, ADME, InVivomics, genomics and patient data – creating a true end-to-end data stack across biology and chemistry that powers our state-of-the-art active learning models.

Precision chemistry design

Using generative AI, predictive models, and experimentation, we design, synthesize, and test novel molecules that are optimized for efficacy, selectivity, safety, and bioavailability.

Turning drug discovery
into a search problem

Central to our mission is the Recursion Operating System (OS), a platform powered by one of the world’s largest proprietary biological and chemical datasets. Instead of looking narrowly at a handful of diseases with existing therapeutic hypotheses, we build Maps of Biology and Chemistry that broaden our search and allow us to explore unknown areas of disease biology.

Virtuous cycles of atoms and bits

A core philosophy behind our maps is our ability to create virtuous cycles of atoms and bits, where we profile the real (atoms) to create digital representations (bits) in an iterative loop of experimentation and prediction.

We systematically profile different biological and chemical perturbations, each separately, then use our deep learning models to infer relationships between all possible combinations. We call this inference-based discovery, a significantly faster and more effective method than brute-force discovery. To date, we have generated nearly 4 trillion searchable relationships across all our maps.

In our drug discovery efforts, we continue the virtuous cycle by testing these predictions in our highly automated wet labs, which in turn generates more data on which improved predictions can be made.

Better data = better predictions

In machine learning research, the quality of the dataset on which models are trained is critical to ensuring the accuracy of the model’s predictions. Our highly automated wet laboratories control our data generation in-house, where we conduct millions of experiments across every human gene and our library of chemical compounds to generate our multi-layered dataset for mapping. This has resulted in more than 50 petabytes of high-quality data – one of the world’s largest proprietary biological and chemical datasets.

Our data generation strategy follows these 3 principles

Scalability

No static dataset will ever be sufficient to decode the vast space of biology. Our dataset is designed to expand over time as we test and validate predictions experimentally.

Reliability

Reliable and accurate data is essential to reproducibility. We use highly controlled and standardized protocols while correcting for any variability in the technical execution of experiments to generate quality data.

Read the full story

Relatability

We build connected datasets, enabling comparisons across time and experimental methods. That way, the data we generate tomorrow can be related to data generated five years ago.

Explore our datasets & models

No static dataset will ever be sufficient to decode the vast space of biology. Our dataset is designed to expand over time as we test and validate predictions experimentally.

No static dataset will ever be sufficient to decode the vast space of biology. Our dataset is designed to expand over time as we test and validate predictions experimentally.

Our data generation strategy follows these 3 principles

Scalability

No static dataset will ever be sufficient to decode the vast space of biology. Our dataset is designed to expand over time as we test and validate predictions experimentally.

Reliability

Reliable and accurate data is essential to reproducibility. We use highly controlled and standardized protocols while correcting for any variability in the technical execution of experiments to generate quality data.

No static dataset will ever be sufficient to decode the vast space of biology. Our dataset is designed to expand over time as we test and validate predictions experimentally.

Read the full story

Relatability

We build connected datasets, enabling comparisons across time and experimental methods. That way, the data we generate tomorrow can be related to data generated five years ago.

No static dataset will ever be sufficient to decode the vast space of biology. Our dataset is designed to expand over time as we test and validate predictions experimentally.

Explore our datasets & models

Imaging is our bread and butter, but we are so much more

We are pioneers of phenomics, the analysis of high-dimensional data from microscopy images of human cells. Images are rich with data, yet relatively cheap to capture and analyze at scale. AI turns unstructured images into computable data, creating biologically meaningful representations of cells that can be compared and contrasted to understand relationships across genes, compounds, and other perturbations. These relationships form the basis of our Maps of Biology and Chemistry.


Over the years, we’ve expanded our data generation to incorporate additional modalities that, when combined, allow us to gather a holistic picture of causal biological relationships. Recently, we unveiled LOWE, our LLM-based software capable of performing complex drug discovery tasks by orchestrating both the wet-lab and dry-lab components of Recursion OS.

Partner with us