LigandMPNN
Overview
LigandMPNN is a protein sequence design tool that can design sequences for protein backbones,
including binding interfaces to ligands or partners. This tutorial shows how to run the
LigandMPNN runner in ProtFlow and collect results back into your Poses
object.
For tool details, see:
Installation
Follow the LigandMPNN installation instructions and then register paths in your ProtFlow config file:
# LigandMPNN runner
LIGANDMPNN_SCRIPT_PATH = "" # e.g. "/path/to/LigandMPNN/run.py"
LIGANDMPNN_PYTHON_PATH = "" # e.g. "/path/to/conda/envs/ligandmpnn/bin/python3"
LIGANDMPNN_PRE_CMD = "" # optional, e.g. "module load cuda/11.8"
Note
To check which config file ProtFlow is using, run:
protflow-check-config
Quickstart: run LigandMPNN
This example loads input PDBs, runs LigandMPNN locally, and inspects the scores. Replace the file paths with your own.
from protflow.poses import Poses
from protflow.jobstarters import LocalJobStarter
from protflow.tools.ligandmpnn import LigandMPNN
# Use a local jobstarter for small tests
local_jobstarter = LocalJobStarter(max_cores=1)
# Load input structures as poses
poses = Poses(
poses="path/to/input_pdbs/",
glob_suffix="*.pdb",
work_dir="ligandmpnn_example",
storage_format="json",
jobstarter=local_jobstarter,
)
# Initialize LigandMPNN runner
ligandmpnn = LigandMPNN()
# Run LigandMPNN
poses = ligandmpnn.run(
poses=poses,
prefix="ligmpnn",
nseq=2,
model_type="protein_mpnn",
)
# Inspect results (columns are prefixed with "ligmpnn_")
print(poses.df[["poses_description", "ligmpnn_sequence", "ligmpnn_overall_confidence"]])
Common options
LigandMPNN supports many command line options. Pass them via options exactly
as you would on the command line (excluding input/output paths, which ProtFlow manages).
ligandmpnn_opts = "--temperature 0.1 --seed 1"
poses = ligandmpnn.run(
poses=poses,
prefix="ligmpnn_opts",
nseq=2,
model_type="protein_mpnn",
options=ligandmpnn_opts,
)
Pose-specific options
Sometimes each pose needs different settings. Use pose_options to pass an
option string per pose (one entry per row in Poses.df).
poses = Poses(
poses="path/to/input_pdbs/",
glob_suffix="*.pdb",
work_dir="ligandmpnn_pose_opts",
jobstarter=local_jobstarter,
)
# One options string per pose (None means "no extra options")
poses.df["ligandmpnn_pose_opts"] = [
"--fixed_residues 'A34 A173'",
"--fixed_residues 'A36 A134'",
None,
]
poses = ligandmpnn.run(
poses=poses,
prefix="ligmpnn_pose_opts",
nseq=1,
model_type="protein_mpnn",
pose_options="ligandmpnn_pose_opts",
)
Fixed or redesigned residues from columns
If you already store residue selections in columns, you can map them to
LigandMPNN options with fixed_res_col and design_res_col. Each value
should be a whitespace-separated list like "A34 A173".
poses = Poses(
poses="path/to/input_pdbs/",
glob_suffix="*.pdb",
work_dir="ligandmpnn_fixed_design",
jobstarter=local_jobstarter,
)
poses.df["fixed_residues"] = ["A34 A173", "A36 A134", ""]
poses.df["design_residues"] = ["A1 A2", "A5 A6", "A7 A8"]
poses = ligandmpnn.run(
poses=poses,
prefix="ligmpnn_fixed_design",
nseq=1,
model_type="protein_mpnn",
fixed_res_col="fixed_residues",
design_res_col="design_residues",
)
Outputs
LigandMPNN writes its output into the run directory inside your work_dir:
backbones/: input backbonesseqs/: designed sequences (FASTA)packed/: optionally packed structures when using sidechain packingligandmpnn_scores.json: collected scores (default ProtFlow format)
The Poses dataframe is updated with new columns
prefixed by the run name (e.g., ligmpnn_sequence, ligmpnn_overall_confidence,
ligmpnn_ligand_confidence, ligmpnn_location).