Using LLMs for Faster Answers to Deeper Biological Questions for Drug Discovery Research

Written By:
No items found.
Read the post ›

To arrive at a deeper understanding of the connections between biology and disease, researchers at Recursion have developed LLM-based tools that turn both unstructured data – like scientific publications – and structured data – like our own automated-lab produced phenomics data – into actionable insights. 

These include an enhanced query tool – which allows researchers to easily make discoveries from published scientific papers; and a knowledge graph tool – which integrates data from clinical trials, publications, genes, phenotypes, drug interactions, cell lines, and many other categories, scores them via machine learning, and performs target prediction and gene-disease relevance.

Literature Review in Minutes

It used to be that if scientists wanted to perform a custom literature review, they had to engage in a multi-day search process. But with our tools, Recursionauts can query all papers published across PubMed (35+ million titles and abstracts indexed on gene mention and disease mention), PubMed Central (6 million full text articles), Citeline (drugs and clinical trials database), Google, and other sources in a ChatGPT-like format, and receive almost immediate insights into questions around which diseases are relevant to a particular target, for instance, based on the existing literature. Recent publications are prioritized and all information is verified by our teams.

The tool not only helps Recursion’s biologists, medicinal chemists and toxicologists determine the viability of particular drug discovery programs – but is also invaluable for engineers, says Dave Brett, Associate Director of Large Language Models. 

“This is useful across the whole drug discovery pipeline,” Brett says. “Engineers use it when they are non-experts and trying to understand a certain domain when, for example, a scientist tells them ‘I need this feature.’ It helps them to bridge the gap between people with knowledge in different domains.” 

Since it was launched over a year ago, the enhanced query tool has over 150 users across the company, has generated over 1,000 reviews, and has processed some 600,000 documents. 

Another one of the tool's unique features is that it not only produces an answer, but evaluates it in the context of additional literature – essentially performing its own review process. “It takes the key statements in the original answer, and then creates questions to try to verify those statements,” Brett says. “The verified statements become the final answer.”

This approach provides a broader response that incorporates more literature. 

Across the TechBio Universe

Recursion’s AI-enabled Maps of Biology, built from automated labs that capture cell phenomics, lead to a number of promising signals – hundreds of potential targets. Recursion’s knowledge graph tool evaluates these possibilities through a complex lens of topics of interest in biology and drug discovery – including global trend scores, protein pockets and structure, competitive landscape, and clinical trials. 

The knowledge graph allows researchers to perform “target deconvolution” – identifying and validating the molecular targets of a small molecule's phenotypic responses – in order to narrow those hundreds of possibilities into the best target opportunity.

“For the precision design process, you have to know the exact target that your compound is binding to,” Brett says. Once that ideal target has been identified, “we can then precision design a compound to fit,” he says.

Author: Brita Belli, Senior Communications Manager at Recursion.