Large Language Models based on transformer-encoder architecture are very good at embedding text and finding relationships
between letters/words in the text.
These models used a transformer-encoder LLM to classify molecules' potency (IC50 value) based only
on SMILES strings.
These models can achive very high accuracy from fairly little fine-tuning.
These models are finetuned versions of
BERT, first domain-adapted to chemistry using 40,000 molecule structures and thentrained using experimental
IC50 values from ChEMBL.
Use these model with the HuggingFace pipeline. See the modelcards linked below!
Loading AI model, please wait...
IC50 classification ...
This is the work of Dr. Mauricio Cafiero and may be used widely though attribution is appreciated.