PottsMPNN
Overview
PottsMPNN is a protein sequence-design and mutation-energy prediction tool based on ProteinMPNN-style structure conditioning with learned Potts energies. For tool details, see:
With ProtFlow’s PottsMPNN runner you can run the upstream YAML-based command-line
workflows from a Poses object:
sample_seqs.pydesigns sequences for input PDB backbones.energy_prediction.pyscores mutations or deep mutational scans.
The runner writes PottsMPNN YAML configs, starts jobs through a ProtFlow jobstarter, collects FASTA or CSV outputs, and merges the results back into the poses dataframe.
Installation
Follow the installation instructions in the official PottsMPNN repository: https://github.com/KeatingLab/PottsMPNN.
Once installed, add the PottsMPNN paths to your ProtFlow config file:
# path to the PottsMPNN repository checkout
POTTSMPNN_DIR = "" # e.g. "/path/to/PottsMPNN"
# path to the Python interpreter inside the PottsMPNN environment
POTTSMPNN_PYTHON = "" # e.g. "/path/to/conda/envs/PottsMPNN/bin/python"
# optional shell prefix to activate modules or environments
POTTSMPNN_PRE_CMD = "" # e.g. "conda run -n PottsMPNN"
Note
To check which config file ProtFlow is using, run:
protflow-check-config
Sequence Design
Use SampleSequencePottsMPNNParams for
sample_seqs.py. The params object mirrors the upstream YAML structure and
exposes nested model and inference attributes for IDE autocomplete.
from protflow.poses import Poses
from protflow.jobstarters import LocalJobStarter
from protflow.tools import PottsMPNN, SampleSequencePottsMPNNParams
jobstarter = LocalJobStarter(max_cores=2)
poses = Poses(
poses="/path/to/input_pdbs/",
glob_suffix="*.pdb",
work_dir="/path/to/output_dir/",
jobstarter=jobstarter,
)
params = SampleSequencePottsMPNNParams()
params.inference.num_samples = 4
params.inference.temperature = 0.1
params.model.check_path = "vanilla_model_weights/pottsmpnn_msa_20.pt"
runner = PottsMPNN()
poses = runner.run(
poses=poses,
prefix="potts_design",
params=params,
)
print(poses.df[["poses_description", "potts_design_sequence", "potts_design_location"]])
The location column points to per-sequence FASTA files written by ProtFlow.
If PottsMPNN also writes optimized sequences, those are collected in
{prefix}_optimized_potts_sequence columns.
Pose-Specific Settings
Use PoseCol when a PottsMPNN parameter
should be filled from a Poses.df column. The *_custom fields are
ProtFlow helpers: they are converted into temporary JSON files and passed to the
matching upstream *_json config key.
from protflow.tools import PoseCol, SampleSequencePottsMPNNParams
poses.df["fixed_positions"] = [
{"A": [10, 11, 12]},
{"A": [25, 26]},
]
params = SampleSequencePottsMPNNParams()
params.inference.fixed_positions_custom = PoseCol("fixed_positions")
poses = runner.run(
poses=poses,
prefix="potts_fixed",
params=params,
)
When all pose-specific values are stored in batch-compatible *_custom fields,
the runner can batch multiple poses per config. Pose-specific scalar fields, such
as params.inference.temperature = PoseCol("temperature"), are written as one
config per pose.
Chain Design JSON
PottsMPNN accepts chain-design information either through input_list entries
or chain_dict_json. ProtFlow manages input_list automatically. To pass
pose-specific chain dictionaries, use chain_dict_custom:
poses.df["chain_design"] = [
[["A"], ["B"]],
[["A", "B"], []],
]
params = SampleSequencePottsMPNNParams()
params.chain_dict_custom = PoseCol("chain_design")
poses = runner.run(
poses=poses,
prefix="potts_chain_design",
params=params,
)
Energy Prediction
Use EnergyPredictionPottsMPNNParams with
script="energy_prediction". You can provide either mutant_csv or
mutant_fasta. If both are None, upstream PottsMPNN performs a deep
mutational scan.
from protflow.tools import EnergyPredictionPottsMPNNParams
params = EnergyPredictionPottsMPNNParams()
params.mutant_csv = "/path/to/mutations.csv"
params.inference.ddG = True
params.inference.mean_norm = False
poses = runner.run(
poses=poses,
prefix="potts_energy",
script="energy_prediction",
params=params,
)
print(poses.df[[
"poses_description",
"potts_energy_energy_prediction_scorefile",
"potts_energy_energy_prediction_n_mutations",
]])
The returned scorefile column points to one JSON sidecar per input pose. Each
sidecar contains the full table read from PottsMPNN’s *_scores.csv output.
API Reference
See protflow.tools.pottsmpnn for the full autodoc API, including all
parameter dataclasses and score-collection helpers.