Central to our mission is the Recursion Operating System (OS), a platform powered by one of the world’s largest proprietary biological and chemical datasets. Instead of looking narrowly at a handful of diseases with existing therapeutic hypotheses, we build Maps of Biology and Chemistry that broaden our search and allow us to explore unknown areas of disease biology.
A core philosophy behind our maps is our ability to create virtuous cycles of atoms and bits, where we profile the real (atoms) to create digital representations (bits) in an iterative loop of experimentation and prediction.
We systematically profile different biological and chemical perturbations, each separately, then use our deep learning models to infer relationships between all possible combinations. We call this inference-based discovery, a significantly faster and more effective method than brute-force discovery. To date, we have generated nearly 4 trillion searchable relationships across all our maps.
In our drug discovery efforts, we continue the virtuous cycle by testing these predictions in our highly automated wet labs, which in turn generates more data on which improved predictions can be made.
In machine learning research, the quality of the dataset on which models are trained is critical to ensuring the accuracy of the model’s predictions. Our highly automated wet laboratories control our data generation in-house, where we conduct millions of experiments across every human gene and our library of chemical compounds to generate our multi-layered dataset for mapping. This has resulted in more than 50 petabytes of high-quality data – one of the world’s largest proprietary biological and chemical datasets.
Scalability
No static dataset will ever be sufficient to decode the vast space of biology. Our dataset is designed to expand over time as we test and validate predictions experimentally.
Reliability
Reliable and accurate data is essential to reproducibility. We use highly controlled and standardized protocols while correcting for any variability in the technical execution of experiments to generate quality data.
Relatability
We build connected datasets, enabling comparisons across time and experimental methods. That way, the data we generate tomorrow can be related to data generated five years ago.
No static dataset will ever be sufficient to decode the vast space of biology. Our dataset is designed to expand over time as we test and validate predictions experimentally.
No static dataset will ever be sufficient to decode the vast space of biology. Our dataset is designed to expand over time as we test and validate predictions experimentally.
Scalability
No static dataset will ever be sufficient to decode the vast space of biology. Our dataset is designed to expand over time as we test and validate predictions experimentally.
Reliability
Reliable and accurate data is essential to reproducibility. We use highly controlled and standardized protocols while correcting for any variability in the technical execution of experiments to generate quality data.
No static dataset will ever be sufficient to decode the vast space of biology. Our dataset is designed to expand over time as we test and validate predictions experimentally.
Relatability
We build connected datasets, enabling comparisons across time and experimental methods. That way, the data we generate tomorrow can be related to data generated five years ago.
No static dataset will ever be sufficient to decode the vast space of biology. Our dataset is designed to expand over time as we test and validate predictions experimentally.
We are pioneers of phenomics, the analysis of high-dimensional data from microscopy images of human cells. Images are rich with data, yet relatively cheap to capture and analyze at scale. AI turns unstructured images into computable data, creating biologically meaningful representations of cells that can be compared and contrasted to understand relationships across genes, compounds, and other perturbations. These relationships form the basis of our Maps of Biology and Chemistry.
Over the years, we’ve expanded our data generation to incorporate additional modalities that, when combined, allow us to gather a holistic picture of causal biological relationships. Recently, we unveiled LOWE, our LLM-based software capable of performing complex drug discovery tasks by orchestrating both the wet-lab and dry-lab components of Recursion OS.