.. _boltz:

Boltz
=====

Overview
--------


Boltz is a biomolecular structure prediction tool. For detailed information about its capabilities and performance, see:

- `GitHub <https://github.com/jwohlwend/boltz>`_
- `Boltz1 preprint <https://www.biorxiv.org/content/10.1101/2024.11.19.624167v4>`_
- `Boltz2 preprint <https://www.biorxiv.org/content/10.1101/2025.06.14.659707v1>`_

With ProtFlow’s Boltz runner, you can integrate Boltz into automated protein-design pipelines. The runner supports predictions from pre-written YAML files, and it can also generate YAML with pose-specific options pulled from your poses dataframe (``Poses.df``).
For example, if you designed small-molecule binders with diverse pockets, you can automatically build design-specific pocket constraints into the Boltz input files from columns in ``Poses.df``.
Like any other ProtFlow runner, the Boltz runner collects output scores and predicted locations and integrates them back into your :py:class:`~protflow.poses.Poses` instance.

.. note::

    If you want to add CCD codes for custom ligands to Boltz, check out this repository:
    - github: https://github.com/jacktday/boltztools/tree/main

Installation
------------

Follow the installation instructions in the official Boltz repository:
`https://github.com/jwohlwend/boltz <https://github.com/jwohlwend/boltz>`_.

Once installed, add your Boltz environment paths to the ProtFlow configuration file:

.. code-block:: python
    :name: config-excerpt-boltz
    
    # path to the Boltz binary
    BOLTZ_PATH = "" # e.g. "/your/path/to/probably_conda/envs/boltz/bin/boltz"

    # path to the python interpreter inside the Boltz environment
    BOLTZ_PYTHON = "" # e.g. "/your/path/to/probably_conda/envs/boltz/bin/python3"

You are now ready to use the Boltz runner in your ProtFlow pipelines!

.. note::
    If you are having trouble finding your protflow's config file, try this command in your protflow environment:
    
    .. code-block:: bash

        protflow-check-config

Predicting structures
---------------------

.. note::

    Before running Boltz via ProtFlow, review Boltz’s own prediction instructions so you understand its expected inputs:
    `prediction.md <https://github.com/jwohlwend/boltz/blob/main/docs/prediction.md>`_.

Predicting yaml files
---------------------

ProtFlow supports directly predicting pre-written ``.yaml`` files. 
Load them into a :py:class:`~protflow.poses.Poses` and run the Boltz runner with your desired options.

.. code-block:: python

    # imports
    from protflow.poses import Poses
    from protflow.tools import Boltz
    from protflow.jobstarters import LocalJobStarter

    # set your jobstarter
    ## here we use a localjobstarter which runs jobs using the subprocess module.
    ## you can use any jobstarter you need, e.g. SbatchArrayJobstarter if you work on a SLURM cluster.
    jst = LocalJobStarter(max_cores=1)

    # load poses from directory
    ## here you want to change '/path/to/input_yamls/' to the path to directory that contains your yaml files.
    ## Check out the tutorial in the :ref:`tutorials.load_poses` section if you want to learn more about loading poses. 
    my_poses = Poses(
        poses = '/path/to/input_yamls/', 
        glob_suffix = '*.yaml',
        work_dir = "/path/to/your_output_dir/"
    )

    # initialize Boltz runner
    boltz_runner = Boltz(jobstarter=jst)

    # run Boltz with pre-written yaml files
    ## like with any runner, in the 'options' parameter, we specify commandline options that shall
    ## be passed to Boltz for inference. You can set any options, except for input and output files.
    ## For more info on how to use Runner.run() calls, see here: :ref:`tutorials.run_applications`
    my_poses = boltz_runner.run(
        poses = my_poses, 
        prefix = 'boltz', # the prefix is the name of your run. No prefix is allowed twice.
        options = "--diffusion_samples 5 --no_kernels"
    )

    # you might want to display some results.
    ## Your predicted poses can always be found in the column: {prefix}_location
    display(my_poses.df[["poses_description", "plddt", "ptm"]])


Creating custom yaml files with BoltzParams
-------------------------------------------

In most cases, your proteins start as ``.pdb`` or ``.fasta`` files. 
ProtFlow converts these into Boltz-compatible YAML automatically. 
If you want to customize predictions (e.g., add custom ligands, proteins, constraints, DNA, etc.), 
use :py:class:`protflow.tools.boltz.BoltzParams`.

``BoltzParams`` is a helper class to add modifications to the Boltz .yaml files
generated by ProtFlow.

.. code-block:: python

    # imports
    from protflow.tools.boltz import BoltzParams, Boltz

    # How to set up BoltzParams
    params = BoltzParams()

    # add stuff
    params.add_protein(
        id = "B", # <--- chain A is already taken by our protein!>
        sequence = "MYSEQVENCE",
        msa = "server"
    )

    # add ligand
    params.add_ligand(
        id = "Z",
        ligand = "CC1=CC=CC=C1", # SMILES string
        ligand_type = "smiles" # can be either 'smiles' or 'ccd' if ccd code was passed in `ligand`
    )

    ## Now, when running Boltz, simply add the params object to the run() call:
    boltz_runner = Boltz(jobstarter=jst)
    proteins = boltz_runner.run(
        poses = my_poses, 
        prefix = 'boltz_with_ligand', 
        options = "--diffusion_samples 5 --no_kernels --use_msa_server",
        params = params # <--- here we pass the params object
    )

.. note::

    ProtFlow uses the function ``convert_poses_to_boltz_yaml`` to convert poses
    into Boltz-compatible .yaml files. You can use this function directly if you
    just want to convert .pdb or .fasta files into .yaml format.

Generating pose-specific yaml files with poses_cols
---------------------------------------------------

Boltz supports covalent bonds, pocket constraints, templates, and other features.
Often these modifications differ per protein in your :py:class:`~protflow.poses.Poses` object.

To handle this, :py:class:`~protflow.tools.boltz.BoltzParams` helper methods accept a ``poses_cols`` argument. 
Provide the names of the keyword arguments that should be interpreted as column names in ``Poses.df``. 
For each pose, the value will be pulled from that column.

.. code::

    # setup as above
    from protflow.poses import Poses
    from protflow.tools import Boltz, BoltzParams
    from protflow.jobstarters import LocalJobStarter

    # jobstarter
    jst = LocalJobStarter(max_cores=1)

    # load poses
    poses_path = "/home/markus/ProtFlow/examples/data/input_pdbs/boltz"
    structs = ["struct1.pdb", "struct2.pdb", "struct3.pdb"]
    poses_fpl = [os.path.join(poses_path, structs) for struct in structs]
    my_poses = Poses(
        poses = poses_fpl,
        work_dir = "/path/to/your_output_dir"
    )

    # add pose-specific info to the poses dataframe
    my_poses.df["ligand_smiles"] = [
        "CC1=CC=CC=C1", # benzene <--- will go to struct1
        "C1=CC=C(C=C1)C=O", # benzaldehyde <--- will go to struct2
        "C1=CC=C(C=C1)O" # phenol <--- will go to struct3
    ]

    # create BoltzParams object with pose-specific ligand info
    params = BoltzParams()
    params.add_ligand(
        id = "Z",
        ligand = "ligand_smiles", # <--- here we pass the name of the poses_col
        ligand_type = "smiles" # can be either 'smiles' or 'ccd' if ccd code was passed in `ligand
        poses_cols = ["ligand"] # <--- here we specify that the argument passed in 'ligand' shall be taken from the poses_df
    )

    # run Boltz with pose-specific ligand info
    boltz = Boltz(jobstarter=jst)
    my_poses = boltz.run(
        poses=my_poses,
        prefix="boltz_pose_specific",
        options="--diffusion_samples 5 --no_kernels",
        params=params
    )