Write a Generic Metric
Use protflow.metrics.generic_metric_runner.GenericMetric when you need
one function call per pose and the output can be represented as a single
JSON-serializable value per pose.
Typical use-cases:
quick structural sanity checks that are not worth a dedicated runner
custom metrics used only in one project
wrapping an existing helper function so it can run through a ProtFlow
JobStarter
Do not use GenericMetric when the metric:
needs multiple poses at once
produces non-JSON output
should return several columns instead of one
<prefix>_datacolumnneeds different options for different poses in one run
What the function must look like
The worker process imports the target with importlib.import_module(module)
and then calls function(pose_path, **options) for each pose. That means:
the first argument must be the pose path as a string
additional parameters must be regular keyword arguments
the return value must be JSON-serializable
the function must live in an importable Python module
Notebook-only functions, lambdas, and functions defined only in an interactive session will not work, because worker jobs import the function again in a new Python process.
Make the module importable
The module=... argument is a Python module path, not a shell PATH
lookup.
Your metric function must be importable by the Python interpreter used by
GenericMetric:
the interpreter given through
python_path, orthe default ProtFlow environment interpreter from
PROTFLOW_ENV
Valid ways to make the module importable:
put the function into
protflowitselfput it into another package that is installed in the same environment
install your own package into that environment
export
PYTHONPATHbefore launching ProtFlow so the module directory is on Python’s import path
If you launch jobs through SLURM, the compute nodes also need that same Python
environment or PYTHONPATH setup.
Example file layout
If you store your function in:
my_project_metrics/
|-- __init__.py
`-- custom_metrics.py
then the corresponding module string is:
module="my_project_metrics.custom_metrics"
Example metric with options
The example below counts C-alpha atoms on a selected chain whose B-factor is at
least min_bfactor. For AlphaFold-style structures, this can be used as a
quick proxy for “confident residues on chain A” if confidence values are stored
in the B-factor column.
# file: my_project_metrics/custom_metrics.py
from Bio.PDB import PDBParser
def count_confident_ca_atoms(pose: str, chain: str, min_bfactor: float = 0.0) -> int:
"""Return the number of CA atoms on one chain above a B-factor threshold."""
structure = PDBParser(QUIET=True).get_structure("pose", pose)
model = next(structure.get_models())
if chain not in model:
raise KeyError(f"Chain {chain!r} not found in {pose}")
return sum(
1
for atom in model[chain].get_atoms()
if atom.id == "CA" and atom.bfactor >= min_bfactor
)
This function satisfies the GenericMetric contract:
poseis the first argumentchainandmin_bfactorare regular keyword argumentsthe return value is an integer, which is JSON-serializable
Run the metric in ProtFlow
from protflow.poses import Poses
from protflow.jobstarters import LocalJobStarter
from protflow.metrics.generic_metric_runner import GenericMetric
jobstarter = LocalJobStarter(max_cores=8)
poses = Poses(
poses="/data/input_pdbs",
glob_suffix="*.pdb",
work_dir="generic_metric_example",
jobstarter=jobstarter,
)
confident_len = GenericMetric(
module="my_project_metrics.custom_metrics",
function="count_confident_ca_atoms",
options={"chain": "A", "min_bfactor": 70.0},
jobstarter=jobstarter,
)
poses = confident_len.run(poses=poses, prefix="chain_a_confident_len")
print(poses.df[["poses_description", "chain_a_confident_len_data"]])
What happens during run():
ProtFlow creates
<work_dir>/<prefix>.It splits the pose list into chunks.
It starts one worker command per chunk through the selected
JobStarter.Each worker imports
my_project_metrics.custom_metricsand callscount_confident_ca_atoms(pose, chain="A", min_bfactor=70.0)for every pose in its chunk.The results are merged back into
poses.df.
Results and files
After the run:
the metric value is in
chain_a_confident_len_dataProtFlow also stores
chain_a_confident_len_descriptionandchain_a_confident_len_locationthe cached runner scorefile is written inside the run directory
If you run the same prefix again and do not set overwrite=True, ProtFlow
reuses the cached scorefile instead of recomputing the metric.
Important limits
GenericMetric applies one shared options dictionary to all poses in a
single run. It does not support pose-specific options columns.
If you need per-pose options, use one of these approaches:
split the poses into subsets and run
GenericMetricmultiple timeswrite a dedicated runner for that metric
Returning a scalar is usually the cleanest option. Returning a list or dict is
allowed as long as it is JSON-serializable, but it will still be stored inside a
single <prefix>_data column.