.. _boltz: Boltz ===== Overview -------- Boltz is a biomolecular structure prediction tool. For detailed information about its capabilities and performance, see: - `GitHub `_ - `Boltz1 preprint `_ - `Boltz2 preprint `_ With ProtFlow’s Boltz runner, you can integrate Boltz into automated protein-design pipelines. The runner supports predictions from pre-written YAML files, and it can also generate YAML with pose-specific options pulled from your poses dataframe (``Poses.df``). For example, if you designed small-molecule binders with diverse pockets, you can automatically build design-specific pocket constraints into the Boltz input files from columns in ``Poses.df``. Like any other ProtFlow runner, the Boltz runner collects output scores and predicted locations and integrates them back into your :py:class:`~protflow.poses.Poses` instance. .. note:: If you want to add CCD codes for custom ligands to Boltz, check out this repository: - github: https://github.com/jacktday/boltztools/tree/main Installation ------------ Follow the installation instructions in the official Boltz repository: `https://github.com/jwohlwend/boltz `_. Once installed, add your Boltz environment paths to the ProtFlow configuration file: .. code-block:: python :name: config-excerpt-boltz # path to the Boltz binary BOLTZ_PATH = "" # e.g. "/your/path/to/probably_conda/envs/boltz/bin/boltz" # path to the python interpreter inside the Boltz environment BOLTZ_PYTHON = "" # e.g. "/your/path/to/probably_conda/envs/boltz/bin/python3" You are now ready to use the Boltz runner in your ProtFlow pipelines! .. note:: If you are having trouble finding your protflow's config file, try this command in your protflow environment: .. code-block:: bash protflow-check-config Predicting structures --------------------- .. note:: Before running Boltz via ProtFlow, review Boltz’s own prediction instructions so you understand its expected inputs: `prediction.md `_. Predicting yaml files --------------------- ProtFlow supports directly predicting pre-written ``.yaml`` files. Load them into a :py:class:`~protflow.poses.Poses` and run the Boltz runner with your desired options. .. code-block:: python # imports from protflow.poses import Poses from protflow.tools import Boltz from protflow.jobstarters import LocalJobStarter # set your jobstarter ## here we use a localjobstarter which runs jobs using the subprocess module. ## you can use any jobstarter you need, e.g. SbatchArrayJobstarter if you work on a SLURM cluster. jst = LocalJobStarter(max_cores=1) # load poses from directory ## here you want to change '/path/to/input_yamls/' to the path to directory that contains your yaml files. ## Check out the tutorial in the :ref:`tutorials.load_poses` section if you want to learn more about loading poses. my_poses = Poses( poses = '/path/to/input_yamls/', glob_suffix = '*.yaml', work_dir = "/path/to/your_output_dir/" ) # initialize Boltz runner boltz_runner = Boltz(jobstarter=jst) # run Boltz with pre-written yaml files ## like with any runner, in the 'options' parameter, we specify commandline options that shall ## be passed to Boltz for inference. You can set any options, except for input and output files. ## For more info on how to use Runner.run() calls, see here: :ref:`tutorials.run_applications` my_poses = boltz_runner.run( poses = my_poses, prefix = 'boltz', # the prefix is the name of your run. No prefix is allowed twice. options = "--diffusion_samples 5 --no_kernels" ) # you might want to display some results. ## Your predicted poses can always be found in the column: {prefix}_location display(my_poses.df[["poses_description", "plddt", "ptm"]]) Creating custom yaml files with BoltzParams ------------------------------------------- In most cases, your proteins start as ``.pdb`` or ``.fasta`` files. ProtFlow converts these into Boltz-compatible YAML automatically. If you want to customize predictions (e.g., add custom ligands, proteins, constraints, DNA, etc.), use :py:class:`protflow.tools.boltz.BoltzParams`. ``BoltzParams`` is a helper class to add modifications to the Boltz .yaml files generated by ProtFlow. .. code-block:: python # imports from protflow.tools.boltz import BoltzParams, Boltz # How to set up BoltzParams params = BoltzParams() # add stuff params.add_protein( id = "B", # <--- chain A is already taken by our protein!> sequence = "MYSEQVENCE", msa = "server" ) # add ligand params.add_ligand( id = "Z", ligand = "CC1=CC=CC=C1", # SMILES string ligand_type = "smiles" # can be either 'smiles' or 'ccd' if ccd code was passed in `ligand` ) ## Now, when running Boltz, simply add the params object to the run() call: boltz_runner = Boltz(jobstarter=jst) proteins = boltz_runner.run( poses = my_poses, prefix = 'boltz_with_ligand', options = "--diffusion_samples 5 --no_kernels --use_msa_server", params = params # <--- here we pass the params object ) .. note:: ProtFlow uses the function ``convert_poses_to_boltz_yaml`` to convert poses into Boltz-compatible .yaml files. You can use this function directly if you just want to convert .pdb or .fasta files into .yaml format. Generating pose-specific yaml files with poses_cols --------------------------------------------------- Boltz supports covalent bonds, pocket constraints, templates, and other features. Often these modifications differ per protein in your :py:class:`~protflow.poses.Poses` object. To handle this, :py:class:`~protflow.tools.boltz.BoltzParams` helper methods accept a ``poses_cols`` argument. Provide the names of the keyword arguments that should be interpreted as column names in ``Poses.df``. For each pose, the value will be pulled from that column. .. code:: # setup as above from protflow.poses import Poses from protflow.tools import Boltz, BoltzParams from protflow.jobstarters import LocalJobStarter # jobstarter jst = LocalJobStarter(max_cores=1) # load poses poses_path = "/home/markus/ProtFlow/examples/data/input_pdbs/boltz" structs = ["struct1.pdb", "struct2.pdb", "struct3.pdb"] poses_fpl = [os.path.join(poses_path, structs) for struct in structs] my_poses = Poses( poses = poses_fpl, work_dir = "/path/to/your_output_dir" ) # add pose-specific info to the poses dataframe my_poses.df["ligand_smiles"] = [ "CC1=CC=CC=C1", # benzene <--- will go to struct1 "C1=CC=C(C=C1)C=O", # benzaldehyde <--- will go to struct2 "C1=CC=C(C=C1)O" # phenol <--- will go to struct3 ] # create BoltzParams object with pose-specific ligand info params = BoltzParams() params.add_ligand( id = "Z", ligand = "ligand_smiles", # <--- here we pass the name of the poses_col ligand_type = "smiles" # can be either 'smiles' or 'ccd' if ccd code was passed in `ligand poses_cols = ["ligand"] # <--- here we specify that the argument passed in 'ligand' shall be taken from the poses_df ) # run Boltz with pose-specific ligand info boltz = Boltz(jobstarter=jst) my_poses = boltz.run( poses=my_poses, prefix="boltz_pose_specific", options="--diffusion_samples 5 --no_kernels", params=params )