BANDIT: using diverse data to measure drug similarity and predict drug targets

Similar small molecules tend to be similarly effective against disease, often sharing the same target. But what is similarity? And could an accurate measure, derived from diverse sources, help researchers zero in on promising drug candidates?

Luis Dreisbach was reading: Madhukar, N.S., Khade, P.K., Huang, L. et al. A Bayesian machine learning approach for drug target identification using diverse data typesNat Commun 10, 5221 (2019).

The challenge

Drug development is an expensive and time-consuming process: 15 years and $2.6bn per small molecule drug. Often, one of the key factors in this labour and time-intensive process is the identification of targets for a given drug candidate. “For natural products and phenotypic screen derived small molecules, one of the greatest bottlenecks is identifying the targets of any candidate molecules,” the authors of this paper note. Any approach that could help identify which protein a given small molecule might target could save both time and money.   


The authors have built a flexible and extendable model to compute the similarity between two drugs. Called BANDIT (Bayesian ANalysis to determine Drug Interaction Targets), this model has two key applications in the drug development process:

1. Identifying targets for a small molecule
BANDIT can be used to predict targets for new compounds. This is useful if a certain small molecule has shown some effect in previous experiments as phenotypic screening, but the chemical mechanism is still unknown. As the authors show, similar drugs identified by BANDIT often have shared targets. Thus these targets can then be prioritized in subsequent experiments.

Figure 1: Using BANDIT to identify targets for a given small molecule

An example for Scenario 1

The authors used the model to predict targets for a small molecule – ONC201 – that was discovered through phenotypic screen and is currently in phase II clinical trials for different types of cancer, including aggressive forms of brain cancer for which few if any treatment options exist. The target of the drug was, however, still unclear. Through using BANDIT they predicted the dopamine receptor DRD2 as a likely target; other target prediction models (such as SEA and SuperPred) did not identify DRD2. 

Subsequent experiments, guided by this result, could verify BANDIT’s prediction and foster a deeper understanding of the drug’s functionality.

This prediction has also had a bearing on how subsequent clinical trials were designed.

2. Finding novel ways to target a protein

In a situation where a protein has already been targeted by a small molecule, additional small molecules likely to target that protein can be identified by examining a compound library for similar small molecules. This can lead to the development of new drugs that are at least as effective and may also:

  • have fewer side effects
  • be less prone to drug resistance
Figure 2: Finding novel ways to target a specific protein

An example for Scenario 2

BANDIT was also used to discover antimicrotubule agents – a class of drugs widely used as cancer chemotherapeutics. From a list of 24 promising compounds, 14 showed significant effect in human breast cancer cells. The key point to note here is that only nine of these candidates would have been identified by looking at structural similarities of drugs; the remainder were only identified due to the extra insight BANDIT gleaned from its additional data sources. This shows the value of the different data types incorporated into the BANDIT model.

On top of this, three of the 14 drugs showed activity in cancer cells that had developed resistance to existing antimicrotubule drugs. 

Computing the similarity of drugs can lead to further interesting applications, such as:

  • Predicting side effects by comparing the effects of closely related drugs
  • Predicting the mode of action by comparing the drug to closely related drugs

How it works

Using several data sources, the authors computed similarity scores between drugs for different aspects, then combined them, using a bayesian approach, to produce a single similarity score.

Five types of data were used, grouped into three categories:

However, one of the advantages of this model is that it can be easily modified and improved. Any new data types that become available can easily be assimilated. Similarly, any improvements in data processing or data quality could easily be transferred into improvements in BANDIT’s performance.

There are other models that aim to measure similarity between compounds, but only BANDIT makes use of such a broad variety of data types – and the authors have shown that the scores are indeed measuring different aspects of “similarity”.

Could this work in practice?

BANDIT is already in use, and has been shown to generate a more reliable measure of similarity than other currently used tools; it has already identified the target for one drug that has been moved into clinical development.

Open questions

BANDIT has proved its ability to predict similarity in a limited number of cases, but it is only over time that its ability to augment drug discovery could truly be measured. It could, for example, be used to create a network of drug similarity that might enable potential synergistic drug combinations to be identified.  

It is also important to note that the model has only been trained on publicly available data; its similarity calculations could become still more accurate if the quantity and quality of its data were increased.



Luis Dreisbach

Associate (Data Science)

+49 (0) 162 23 74  359


Potsdamer Straße 68
10785 Berlin