Predict IC50 in enzymes.

Large Language Models based on transformer-encoder architecture are very good at embedding text and finding relationships between letters/words in the text.
These models used a transformer-encoder LLM to classify molecules' potency (IC50 value) based only on SMILES strings.
These models can achive very high accuracy from fairly little fine-tuning.
These models are finetuned versions of BERT, first domain-adapted to chemistry using 40,000 molecule structures and thentrained using experimental IC50 values from ChEMBL.
Use these model with the HuggingFace pipeline. See the modelcards linked below!

Choose enzyme and then wait for the model to load:

HMGCR (model card)

MAOB (model card)

Enter SMILES for IC50 classification and hit 'Enter':

Loading AI model, please wait...

IC50 classification ...

This is the work of Dr. Mauricio Cafiero and may be used widely though attribution is appreciated.

LLM-based IC50 prediction for molecules in various enzymes from SMILES strings