‘PROTACtable proteins’: how a scoring system identified a thousand new potential drug targets

Taking their cue from the druggable genome, a group of researchers have scoured publicly available biological data to pinpoint proteins that may be susceptible to PROTAC modulation

Benjamin Häusler comments on: Schneider, M., Radoux, C.J., Hercules, A. et al. The PROTACtable genome. Nat Rev Drug Discov 20, 789–797 (2021).

The challenge

When it comes to the development of new drugs, one of the crucial tasks is finding a suitable target. That’s not so easy, when you consider the human genome contains about 20,000 protein-coding genes. The need for a structured evaluation framework is crystal clear.

You’ll probably have heard of the Illumating the Druggable Genome (IDG) initiative and their framework to systematically evaluate whether targets can be modulated by small molecules.

However, the small molecule modality is not the only one out there. With other agents, such as Proteolysis Targeting Chimeras (PROTACs) on the rise, the need for additional evaluation frameworks analogous to the druggable genome is on the increase. Schneider et al. tackle this requirement by providing a structured framework to assess a target’s PROTAC tractability – termed “PROTACtability” by the authors – thereby helping researchers  hunt down a suitable PROTAC target.

Summary

The authors have developed a systematic, well-structured method of combining public data sources to identify potential drug targets that may be susceptible to modulation using a PROTAC.

Applying this approach to the human proteome, they have identified 1,067 proteins not yet described in published papers as PROTAC targets, the vast majority of which show a strong degree of evidence for at least one disease.

How it works

The authors begin by defining eight “buckets”; each represents one core criterion of their framework. These criteria include the presence of a target in clinical trials, mentions in research papers and various biological properties, such as the presence of ubiquitylation sites or its half-life. In addition to the buckets, the authors defined a score based on the location of the target within the cell using seven ordinal categories. Each target is evaluated according to these criteria and put into the corresponding buckets. (Keep in mind that the same target can be put into multiple buckets.) Figure 1 gives an overview of the criteria and the publicly available data sources used for evaluating a target according to these criteria.

Figure 1: Overview of the core criteria and data sources of the PROTACtable genome framework.

To make a high-level overview possible, Schneider et al. have taken this fine-grained information and sorted it into four PROTACtability categories. Figure 2 shows the detailed decision schema of the four PROTACtability categories: clinical precedence, literature precedence, discovery opportunity and incomplete evidence. The authors define a target as PROTACtable – meaning it is amenable to modulation by a PROTAC – if it falls into any of the first three categories.

Figure 2: Decision schema for the four PROTAC tractability categories. The numbers in parentheses depict the absolute frequency of the corresponding category in the human proteome.

 

Could this work in practice?

To some extent, this approach has already proved its practical worth. Applied to the complete human proteome, the PROTACtable genome approach identified 1,067 proteins not yet identified as PROTAC targets in published papers. However, it remains to be seen how many of these targets succeed in early drug development stages or even in clinical trials. Nonetheless, the value of employing a structured approach to analyse publicly available data sources for potential targets is clearly immense.

Open questions

Could the genome-scale analysis using the PROTACtable genome approach be fully automated and updated on a regular basis? The importance of this question is hinted at by the wording of that last PROTACtability category: incomplete evidence. Some targets may simply fall into this category, because a portion of their basic biological properties, such as half-life information, are yet to be explored. And experimental and technical limitations do, of course, exist – but data availability (or the lack thereof) also plays a major role here.

Repeating the analysis once more data is available will likely result in some of those “incomplete evidence” proteins becoming PROTACtable targets in the future – and automation using AI will be necessary to handle the ever-growing data pool.

Contact

Benjamin Häusler

Associate

Mobile
+49 (0) 176 81 69 84 82

E-Mail
benjamin.haeusler@idalab.de

Address
Potsdamer Straße 68
10785 Berlin