SigmaDock
Overview
SigmaDock is a deep-learning molecular docking tool based on a diffusion generative model. It predicts the binding pose of a ligand to a protein by learning the joint distribution of protein–ligand complexes. For full details see:
With ProtFlow’s SigmaDock runner you can integrate molecular docking into automated protein-design pipelines. The runner supports:
Redocking — re-dock the native ligand extracted from an input complex to validate binding-pose prediction.
Crossdocking — dock one or more external query ligands into the protein extracted from an input complex.
Pre-extracted inputs — supply protein PDBs and ligand SDFs directly via dataframe columns, bypassing the automatic complex-splitting step.
Like any other ProtFlow runner, SigmaDock collects docking scores (affinity,
RMSD, PoseBusters checks) and integrates them back into your
Poses instance.
Note
A ligand must always be present in the input complex even for crossdocking —
the bound ligand is used as a binding-site reference from which SigmaDock infers
the pocket location. Pass the residue name of this ligand via ligand_name.
Installation
Follow the installation instructions in the official SigmaDock repository: https://github.com/alvaroprat97/sigmadock.
Once installed, add your SigmaDock environment paths to the ProtFlow configuration file:
# path to SigmaDock's sample.py inference script
SIGMADOCK_SCRIPT_PATH = "" # e.g. "/your/path/to/sigmadock/sample.py"
# path to the Python interpreter inside the SigmaDock environment
SIGMADOCK_PYTHON_PATH = "" # e.g. "/your/path/to/conda/envs/sigmadock/bin/python"
# path to the SigmaDock model checkpoint directory
SIGMADOCK_CKPT_PATH = "" # e.g. "/your/path/to/sigmadock/checkpoints"
# optional shell prefix to activate the SigmaDock conda environment
# required if SigmaDock's subprocess cannot find its dependencies otherwise
SIGMADOCK_PRE_CMD = "" # e.g. "conda run -n sigmadock"
Note
If you are having trouble finding your ProtFlow config file, run:
protflow-check-config
Redocking
Redocking extracts the native ligand from each input complex and docks it back into the same protein. This is useful for validating predicted structures or benchmarking pose-prediction accuracy.
from protflow.poses import Poses
from protflow.tools.sigmadock import SigmaDock
from protflow.jobstarters import LocalJobStarter
jst = LocalJobStarter(max_cores=1)
# load protein–ligand complexes (PDB or CIF)
my_poses = Poses(
poses="/path/to/complexes/",
glob_suffix="*.cif",
work_dir="/path/to/output_dir/"
)
runner = SigmaDock(jobstarter=jst)
my_poses = runner.run(
poses=my_poses,
prefix="redock",
ligand_name="LIG", # residue name of the ligand in the input complex
num_seeds=1, # number of independent samples per pose (see SigmaDock docs for details)
overwrite=False,
)
# scores are available as prefixed columns
display(my_poses.df[["redock_affinity", "redock_rmsd", "redock_pb_pass_rate"]])
Crossdocking
Crossdocking docks one or more external query ligands into the protein extracted from
each input complex. Pass a list of absolute SDF paths via query_ligands.
my_poses = runner.run(
poses=my_poses,
prefix="crossdock",
ligand_name="LIG",
query_ligands=[
"/data/ligands/compound_1.sdf",
"/data/ligands/compound_2.sdf",
],
overwrite=True,
)
display(my_poses.df[["crossdock_affinity", "crossdock_pb_pass_rate"]])
Note
Each query ligand is docked into every protein in the poses collection.
Store query ligand SDF files outside the runner’s work_dir — the overwrite
cleanup step removes the inputs/ and outputs/ sub-directories.
Pre-extracted inputs
If you already have split protein PDBs and ligand SDFs (e.g. from a previous ProtFlow step), pass them via dataframe columns to skip the automatic complex-splitting step.
Redocking with pre-extracted files
my_poses.df["ligand_sdf"] = ["/data/extracted/compound_A.sdf", ...]
my_poses = runner.run(
poses=my_poses,
prefix="redock_preextracted",
ligand_col="ligand_sdf", # column holding the ligand SDF paths
overwrite=True,
)
Crossdocking with pre-extracted files
my_poses.df["query_ligands"] = [
["/data/ligands/compound_1.sdf", "/data/ligands/compound_2.sdf"],
...
]
my_poses.df["ref_ligand"] = ["/data/extracted/ref_A.sdf", ...]
my_poses = runner.run(
poses=my_poses,
prefix="crossdock_preextracted",
ligand_col="query_ligands", # column holding lists of query SDF paths
ref_ligand_col="ref_ligand", # column holding the pocket-anchor SDF paths
overwrite=True,
)
Note
When ligand_col is set, the poses column (Poses.df["poses"]) is used as
the protein PDB. Pass receptor_col to override this with a different column.
Output scores
After each run the following columns are added to Poses.df under the given
prefix:
Column |
Description |
|---|---|
|
Absolute path to the docked protein–ligand complex PDB. |
|
Predicted binding affinity from SigmaDock’s rescoring model (lower is better). |
|
Intramolecular strain energy of the docked ligand pose. |
|
Ligand RMSD to the reference pose (available in redocking when a reference SDF exists). |
|
Fraction of PoseBusters geometry checks passed. |