Boltz

Overview

Boltz is a biomolecular structure prediction tool. For detailed information about its capabilities and performance, see:

With ProtFlow’s Boltz runner, you can integrate Boltz into automated protein-design pipelines. The runner supports predictions from pre-written YAML files, and it can also generate YAML with pose-specific options pulled from your poses dataframe (Poses.df). For example, if you designed small-molecule binders with diverse pockets, you can automatically build design-specific pocket constraints into the Boltz input files from columns in Poses.df. Like any other ProtFlow runner, the Boltz runner collects output scores and predicted locations and integrates them back into your Poses instance.

Note

If you want to add CCD codes for custom ligands to Boltz, check out this repository: - github: https://github.com/jacktday/boltztools/tree/main

Installation

Follow the installation instructions in the official Boltz repository: https://github.com/jwohlwend/boltz.

Once installed, add your Boltz environment paths to the ProtFlow configuration file:

# path to the Boltz binary
BOLTZ_PATH = "" # e.g. "/your/path/to/probably_conda/envs/boltz/bin/boltz"

# path to the python interpreter inside the Boltz environment
BOLTZ_PYTHON = "" # e.g. "/your/path/to/probably_conda/envs/boltz/bin/python3"

You are now ready to use the Boltz runner in your ProtFlow pipelines!

Note

If you are having trouble finding your protflow’s config file, try this command in your protflow environment:

protflow-check-config

Predicting structures

Note

Before running Boltz via ProtFlow, review Boltz’s own prediction instructions so you understand its expected inputs: prediction.md.

Predicting yaml files

ProtFlow supports directly predicting pre-written .yaml files. Load them into a Poses and run the Boltz runner with your desired options.

# imports
from protflow.poses import Poses
from protflow.tools import Boltz
from protflow.jobstarters import LocalJobStarter

# set your jobstarter
## here we use a localjobstarter which runs jobs using the subprocess module.
## you can use any jobstarter you need, e.g. SbatchArrayJobstarter if you work on a SLURM cluster.
jst = LocalJobStarter(max_cores=1)

# load poses from directory
## here you want to change '/path/to/input_yamls/' to the path to directory that contains your yaml files.
## Check out the tutorial in the :ref:`tutorials.load_poses` section if you want to learn more about loading poses.
my_poses = Poses(
    poses = '/path/to/input_yamls/',
    glob_suffix = '*.yaml',
    work_dir = "/path/to/your_output_dir/"
)

# initialize Boltz runner
boltz_runner = Boltz(jobstarter=jst)

# run Boltz with pre-written yaml files
## like with any runner, in the 'options' parameter, we specify commandline options that shall
## be passed to Boltz for inference. You can set any options, except for input and output files.
## For more info on how to use Runner.run() calls, see here: :ref:`tutorials.run_applications`
my_poses = boltz_runner.run(
    poses = my_poses,
    prefix = 'boltz', # the prefix is the name of your run. No prefix is allowed twice.
    options = "--diffusion_samples 5 --no_kernels"
)

# you might want to display some results.
## Your predicted poses can always be found in the column: {prefix}_location
display(my_poses.df[["poses_description", "plddt", "ptm"]])

Creating custom yaml files with BoltzParams

In most cases, your proteins start as .pdb or .fasta files. ProtFlow converts these into Boltz-compatible YAML automatically. If you want to customize predictions (e.g., add custom ligands, proteins, constraints, DNA, etc.), use protflow.tools.boltz.BoltzParams.

BoltzParams is a helper class to add modifications to the Boltz .yaml files generated by ProtFlow.

# imports
from protflow.tools.boltz import BoltzParams, Boltz

# How to set up BoltzParams
params = BoltzParams()

# add stuff
params.add_protein(
    id = "B", # <--- chain A is already taken by our protein!>
    sequence = "MYSEQVENCE",
    msa = "server"
)

# add ligand
params.add_ligand(
    id = "Z",
    ligand = "CC1=CC=CC=C1", # SMILES string
    ligand_type = "smiles" # can be either 'smiles' or 'ccd' if ccd code was passed in `ligand`
)

## Now, when running Boltz, simply add the params object to the run() call:
boltz_runner = Boltz(jobstarter=jst)
proteins = boltz_runner.run(
    poses = my_poses,
    prefix = 'boltz_with_ligand',
    options = "--diffusion_samples 5 --no_kernels --use_msa_server",
    params = params # <--- here we pass the params object
)

Note

ProtFlow uses the function convert_poses_to_boltz_yaml to convert poses into Boltz-compatible .yaml files. You can use this function directly if you just want to convert .pdb or .fasta files into .yaml format.

Generating pose-specific yaml files with poses_cols

Boltz supports covalent bonds, pocket constraints, templates, and other features. Often these modifications differ per protein in your Poses object.

To handle this, BoltzParams helper methods accept a poses_cols argument. Provide the names of the keyword arguments that should be interpreted as column names in Poses.df. For each pose, the value will be pulled from that column.

# setup as above
from protflow.poses import Poses
from protflow.tools import Boltz, BoltzParams
from protflow.jobstarters import LocalJobStarter

# jobstarter
jst = LocalJobStarter(max_cores=1)

# load poses
poses_path = "/home/markus/ProtFlow/examples/data/input_pdbs/boltz"
structs = ["struct1.pdb", "struct2.pdb", "struct3.pdb"]
poses_fpl = [os.path.join(poses_path, structs) for struct in structs]
my_poses = Poses(
    poses = poses_fpl,
    work_dir = "/path/to/your_output_dir"
)

# add pose-specific info to the poses dataframe
my_poses.df["ligand_smiles"] = [
    "CC1=CC=CC=C1", # benzene <--- will go to struct1
    "C1=CC=C(C=C1)C=O", # benzaldehyde <--- will go to struct2
    "C1=CC=C(C=C1)O" # phenol <--- will go to struct3
]

# create BoltzParams object with pose-specific ligand info
params = BoltzParams()
params.add_ligand(
    id = "Z",
    ligand = "ligand_smiles", # <--- here we pass the name of the poses_col
    ligand_type = "smiles" # can be either 'smiles' or 'ccd' if ccd code was passed in `ligand
    poses_cols = ["ligand"] # <--- here we specify that the argument passed in 'ligand' shall be taken from the poses_df
)

# run Boltz with pose-specific ligand info
boltz = Boltz(jobstarter=jst)
my_poses = boltz.run(
    poses=my_poses,
    prefix="boltz_pose_specific",
    options="--diffusion_samples 5 --no_kernels",
    params=params
)