PottsMPNN

Overview

PottsMPNN is a protein sequence-design and mutation-energy prediction tool based on ProteinMPNN-style structure conditioning with learned Potts energies. For tool details, see:

With ProtFlow’s PottsMPNN runner you can run the upstream YAML-based command-line workflows from a Poses object:

sample_seqs.py designs sequences for input PDB backbones.
energy_prediction.py scores mutations or deep mutational scans.

The runner writes PottsMPNN YAML configs, starts jobs through a ProtFlow jobstarter, collects FASTA or CSV outputs, and merges the results back into the poses dataframe.

Installation

Follow the installation instructions in the official PottsMPNN repository: https://github.com/KeatingLab/PottsMPNN.

Once installed, add the PottsMPNN paths to your ProtFlow config file:

# path to the PottsMPNN repository checkout
POTTSMPNN_DIR = ""  # e.g. "/path/to/PottsMPNN"

# path to the Python interpreter inside the PottsMPNN environment
POTTSMPNN_PYTHON = ""  # e.g. "/path/to/conda/envs/PottsMPNN/bin/python"

# optional shell prefix to activate modules or environments
POTTSMPNN_PRE_CMD = ""  # e.g. "conda run -n PottsMPNN"

Note

To check which config file ProtFlow is using, run:

protflow-check-config

Sequence Design

Use SampleSequencePottsMPNNParams for sample_seqs.py. The params object mirrors the upstream YAML structure and exposes nested model and inference attributes for IDE autocomplete.

from protflow.poses import Poses
from protflow.jobstarters import LocalJobStarter
from protflow.tools import PottsMPNN, SampleSequencePottsMPNNParams

jobstarter = LocalJobStarter(max_cores=2)

poses = Poses(
    poses="/path/to/input_pdbs/",
    glob_suffix="*.pdb",
    work_dir="/path/to/output_dir/",
    jobstarter=jobstarter,
)

params = SampleSequencePottsMPNNParams()
params.inference.num_samples = 4
params.inference.temperature = 0.1
params.model.check_path = "vanilla_model_weights/pottsmpnn_msa_20.pt"

runner = PottsMPNN()
poses = runner.run(
    poses=poses,
    prefix="potts_design",
    params=params,
)

print(poses.df[["poses_description", "potts_design_sequence", "potts_design_location"]])

The location column points to per-sequence FASTA files written by ProtFlow. If PottsMPNN also writes optimized sequences, those are collected in {prefix}_optimized_potts_sequence columns.

Pose-Specific Settings

Use PoseCol when a PottsMPNN parameter should be filled from a Poses.df column. The *_custom fields are ProtFlow helpers: they are converted into temporary JSON files and passed to the matching upstream *_json config key.

from protflow.tools import PoseCol, SampleSequencePottsMPNNParams

poses.df["fixed_positions"] = [
    {"A": [10, 11, 12]},
    {"A": [25, 26]},
]

params = SampleSequencePottsMPNNParams()
params.inference.fixed_positions_custom = PoseCol("fixed_positions")

poses = runner.run(
    poses=poses,
    prefix="potts_fixed",
    params=params,
)

When all pose-specific values are stored in batch-compatible *_custom fields, the runner can batch multiple poses per config. Pose-specific scalar fields, such as params.inference.temperature = PoseCol("temperature"), are written as one config per pose.

Chain Design JSON

PottsMPNN accepts chain-design information either through input_list entries or chain_dict_json. ProtFlow manages input_list automatically. To pass pose-specific chain dictionaries, use chain_dict_custom:

poses.df["chain_design"] = [
    [["A"], ["B"]],
    [["A", "B"], []],
]

params = SampleSequencePottsMPNNParams()
params.chain_dict_custom = PoseCol("chain_design")

poses = runner.run(
    poses=poses,
    prefix="potts_chain_design",
    params=params,
)

Energy Prediction

Use EnergyPredictionPottsMPNNParams with script="energy_prediction". You can provide either mutant_csv or mutant_fasta. If both are None, upstream PottsMPNN performs a deep mutational scan.

from protflow.tools import EnergyPredictionPottsMPNNParams

params = EnergyPredictionPottsMPNNParams()
params.mutant_csv = "/path/to/mutations.csv"
params.inference.ddG = True
params.inference.mean_norm = False

poses = runner.run(
    poses=poses,
    prefix="potts_energy",
    script="energy_prediction",
    params=params,
)

print(poses.df[[
    "poses_description",
    "potts_energy_energy_prediction_scorefile",
    "potts_energy_energy_prediction_n_mutations",
]])

The returned scorefile column points to one JSON sidecar per input pose. Each sidecar contains the full table read from PottsMPNN’s *_scores.csv output.

API Reference

See protflow.tools.pottsmpnn for the full autodoc API, including all parameter dataclasses and score-collection helpers.