How a drug discovery startup used AI to read 10,000 papers and zero in on their ideal targets

51
Target candidates discovered

10,000
Papers analyzed in a single day

91.3%
Accuracy

Challenge

How can we identify our ideal drug targets, hiding in mountains of literature?

— Existing databases are not tailored to the company’s proprietary discovery platform

— Insights of drug hunters are not captured because target selection is nuanced and relies upon complex evidence, not simple word combinations

— Decision making is challenging given the sheer number of hits and inability to prioritize targets across multiple dimensions.

Schedule a call

Solution

Revolutionary ChatGPT technology based on neural networks trained across billions of documents has reached human-level text understanding capable of:

— Drawing conclusions from descriptions of complex experimental setups

— Understanding the consequences of biochemical processes

— Reasoning across different pieces of scientific evidence

Schedule a call to learn more about large language model technology.

Next-generation text mining leaves no stone unturned when finding the right target

Example

Drawing conclusions from descriptions of complex experimental setups

N-glycosylation of TIM4 decreasing its degradation is not directly mentioned in the article.

The AI concludes that N-glycosylation decreases the degradation of TIM4 from the author's statement that when not being N-glycosylated, TIM4 is more susceptible to degradation.

Schedule a call to learn more about how large language models can understand scientific literature

Example

Understanding the consequences of biochemical processes

That GPR26 is being degraded by E3 is not stated in the abstract.

Only from the statement that GPR26 was found to have an ubiquitination site does the AI understand that GPR26 is degraded.

Schedule a call to learn more about how large language models can understand scientific literature.

Example

Reasoning across different pieces of scientific evidence

The AI combines several pieces of evidence spread over different sentences to come to the correct conclusion that HDAC3 decreases NICD1 stability through acetylation.

Schedule a call to learn more about how large language models can understand scientific literature.

Result

Map of evidence

The highly condensed heatmap enables rapid decision making by showing all target candidates scored against complex queries in a digestible visual format.

Drill-down leads to the source evidence behind every cell, including concise summaries of every supporting publication.

Schedule a call to learn more about how large language models can understand scientific literature.

Highlights

51
Target candidates discovered

From the combined evidence of >10,000 publications, the next-generation text mining engine identified 51 proteins as target candidates. More than half of those had not been considered by the R&D team before.

91.3%
Accuracy

After in-depth review by the science team, more than 90% of the discovered target candidates had the exact sought-after characteristics, plus an accurate one-sentence summary of the supporting evidence.

10,000
Papers analyzed in a single day

Supported by a high-performance data infrastructure, the text mining engine processed more than 10,000 papers in a single day at negligible compute cost.

Get in touch

Let's assess your case

If you are looking for an elusive drug target or any other vital piece of information, the outlined approach may be applicable in your situation, too.

Let's talk. 30 min. Fully confidential (NDA), with no strings attached.

Schedule a call

Frequently asked questions

We supported our client along the entire process. Working closely with the science team, we defined the exact criteria and items of information required to identify and assess target candidates. After a rapid feasibility check, we developed the custom text mining engine, based on our unified data platform and BioQuery library. To ensure a high level of accuracy, we iteratively fine-tuned the algorithm using human expert feedback.
Using the custom text mining engine, we delivered a proprietary data set of target candidates with supporting evidence in the form of mini summaries, structured information and links to external databases and sources. Both the data set and the custom text mining engine are protected by confidentiality. The data set is the client's intellectual property.
Six weeks from the kick-off workshop to the delivery of the final result.
Frequent close collaboration was a crucial success factor. During the entire project, we had almost daily interaction with the science team, facilitated by a joint Slack channel, Miro boards and shared documents. In weekly update calls we reviewed the progress with key stakeholders.
As data sources for literature, we used PubMed abstracts.
We applied a pre-filtering of the literature based on meta data, such as focusing only on certain peer-reviewed journals.
No. ChatGPT indeed suffers from so-called hallucinations, where false information is produced in response to a user’s questions, to the extent that, for instance, non-existent scientific references are produced. In our evidence-grounded approach, we constrain the Large Language Model so that hallucinations are ruled out.
Yes. Depending on the requirements, the analysis can be set up as a one-off, on-demand or regular (e.g. monthly) process.

Learn more

Insights

Under the hood: 5 practical lessons from developing Large Language Model Applications for Drug Discovery

The biotech industry has been quick to explore the potential of Large Language Models, yet practical insights remain scarce. In a field that is both art and science, experience is key. Here we share our lessons learned from building LLM applications in drug discovery.

From the Depths of Literature: How Large Language Models Excavate Crucial Information to Scale Drug Discovery

In drug discovery, excavating the right information about potential drug targets and molecules from the depths of the scientific literature is key to success in biotech. Large Language Models will change the nature of this game. Here is how, in three concrete examples.

No-nonsense: How ChatGPT-Technology helps Biotechs find better Drug Targets faster

Large Language Models are poised to become an indispensable tool for biotechs looking to find their ideal drug targets. Evidence-grounded LLMs can sift through millions of publications, finding highly specific pieces of evidence in seconds, unlocking overlooked drug target opportunities.