How a drug discovery startup used AI to read 10,000 papers and zero in on their ideal targets
51
Target candidates discovered
10,000
Papers analyzed in a single day
91.3%
Accuracy
Challenge
How can we identify our ideal drug targets, hiding in mountains of literature?
— Existing databases are not tailored to the company’s proprietary discovery platform
— Insights of drug hunters are not captured because target selection is nuanced and relies upon complex evidence, not simple word combinations
— Decision making is challenging given the sheer number of hits and inability to prioritize targets across multiple dimensions.
Solution
Revolutionary ChatGPT technology based on neural networks trained across billions of documents has reached human-level text understanding capable of:
— Drawing conclusions from descriptions of complex experimental setups
— Understanding the consequences of biochemical processes
— Reasoning across different pieces of scientific evidence
Schedule a call to learn more about large language model technology.
Next-generation text mining leaves no stone unturned when finding the right target
Example
Drawing conclusions from descriptions of complex experimental setups
N-glycosylation of TIM4 decreasing its degradation is not directly mentioned in the article.
The AI concludes that N-glycosylation decreases the degradation of TIM4 from the author's statement that when not being N-glycosylated, TIM4 is more susceptible to degradation.
Schedule a call to learn more about how large language models can understand scientific literature
Example
Understanding the consequences of biochemical processes
That GPR26 is being degraded by E3 is not stated in the abstract.
Only from the statement that GPR26 was found to have an ubiquitination site does the AI understand that GPR26 is degraded.
Schedule a call to learn more about how large language models can understand scientific literature.
Example
Reasoning across different pieces of scientific evidence
The AI combines several pieces of evidence spread over different sentences to come to the correct conclusion that HDAC3 decreases NICD1 stability through acetylation.
Schedule a call to learn more about how large language models can understand scientific literature.
Result
Map of evidence
The highly condensed heatmap enables rapid decision making by showing all target candidates scored against complex queries in a digestible visual format.
Drill-down leads to the source evidence behind every cell, including concise summaries of every supporting publication.
Schedule a call to learn more about how large language models can understand scientific literature.
Highlights
51
Target candidates discovered
From the combined evidence of >10,000 publications, the next-generation text mining engine identified 51 proteins as target candidates. More than half of those had not been considered by the R&D team before.
91.3%
Accuracy
After in-depth review by the science team, more than 90% of the discovered target candidates had the exact sought-after characteristics, plus an accurate one-sentence summary of the supporting evidence.
10,000
Papers analyzed in a single day
Supported by a high-performance data infrastructure, the text mining engine processed more than 10,000 papers in a single day at negligible compute cost.
Get in touch
Let's assess your case
If you are looking for an elusive drug target or any other vital piece of information, the outlined approach may be applicable in your situation, too.
Let's talk. 30 min. Fully confidential (NDA), with no strings attached.
Frequently asked questions
-
We supported our client along the entire process. Working closely with the science team, we defined the exact criteria and items of information required to identify and assess target candidates. After a rapid feasibility check, we developed the custom text mining engine, based on our unified data platform and BioQuery library. To ensure a high level of accuracy, we iteratively fine-tuned the algorithm using human expert feedback.
-
Using the custom text mining engine, we delivered a proprietary data set of target candidates with supporting evidence in the form of mini summaries, structured information and links to external databases and sources. Both the data set and the custom text mining engine are protected by confidentiality. The data set is the client's intellectual property.
-
Six weeks from the kick-off workshop to the delivery of the final result.
-
Frequent close collaboration was a crucial success factor. During the entire project, we had almost daily interaction with the science team, facilitated by a joint Slack channel, Miro boards and shared documents. In weekly update calls we reviewed the progress with key stakeholders.
-
As data sources for literature, we used PubMed abstracts.
-
We applied a pre-filtering of the literature based on meta data, such as focusing only on certain peer-reviewed journals.
-
No. ChatGPT indeed suffers from so-called hallucinations, where false information is produced in response to a user’s questions, to the extent that, for instance, non-existent scientific references are produced. In our evidence-grounded approach, we constrain the Large Language Model so that hallucinations are ruled out.
-
Yes. Depending on the requirements, the analysis can be set up as a one-off, on-demand or regular (e.g. monthly) process.
Learn more