Workflow Step Parsing
YAML -> API kwargs translation for workflow steps. Each step type has a
parser that turns the raw YAML dict into the kwargs of its matching
interface API function. Parsers do YAML-shape validation (required
fields, unsupported args keys, command/type conflicts) inline.
Also exposes workflow-level entry points: parse_step_from_yaml()
and load_workflow_config(). ParsedStep -> YAML dict serialization
lives in
vivarium.cluster_tools.dagger.config.serialization.
- vivarium.cluster_tools.dagger.config.parsing.resolve_step_type(step_dict)[source]
Pick the step-type key for
step_dict.Dispatch rules:
A top-level
commandfield always resolves to"bash";parse_bash_step_from_yaml()enforces the rest of the bash-step schema (including any conflictingtype).Otherwise, an explicit
typeis used.A step with neither
commandnortypeis rejected.
- vivarium.cluster_tools.dagger.config.parsing.parse_bash_step_from_yaml(data, output_directory, *, project, queue)[source]
Parse a raw bash-step YAML dict into API kwargs.
The YAML form for a bash step requires a top-level
commandfield. The optionaltypefield, when present, must be"bash". Noargs:block is accepted.- Return type:
- Parameters:
Examples
YAML configuration:
steps: - name: post_analysis command: python scripts/analyze.py --input /results environment: analysis_env resources: memory_gb: 20 runtime: "02:00:00" cores: 2
- vivarium.cluster_tools.dagger.config.parsing.parse_simulation_step_from_yaml(data, output_directory, *, project, queue)[source]
Parse a raw simulation-step YAML dict into API kwargs.
- Return type:
- Parameters:
Examples
YAML configuration:
steps: - name: model_sims type: simulation resources: memory_gb: 3 runtime: "24:00:00" args: model_specification: /path/to/model.yaml branch_configuration: /path/to/branches.yaml artifact_path: /path/to/artifact.hdf backup_freq: 1800 sim_verbosity: 1
- vivarium.cluster_tools.dagger.config.parsing.parse_pytest_step_from_yaml(data, output_directory, *, project, queue)[source]
Parse a raw pytest-step YAML dict into API kwargs.
Optional
argskeys:path,k,runslow. At least one ofpathorkmust be provided.pathmay be a single string or a list of strings.- Return type:
- Parameters:
Examples
YAML configuration:
steps: - name: unit_tests type: pytest resources: memory_gb: 8 runtime: "01:00:00" cores: 4 args: path: tests/ k: "test_foo" runslow: true
Multiple paths:
steps: - name: unit_and_integration type: pytest resources: memory_gb: 8 runtime: "01:00:00" args: path: - tests/unit - tests/integration
- vivarium.cluster_tools.dagger.config.parsing.parse_python_step_from_yaml(data, output_directory, *, project, queue)[source]
Parse a raw python-step YAML dict into API kwargs.
Required
argskey:path(a.pyscript). Optionalargskeys:positional_args(list of scalars) andkeyword_args(dict of identifier-keyed scalars).- Return type:
- Parameters:
Examples
YAML configuration:
steps: - name: postprocess type: python resources: memory_gb: 8 runtime: "00:30:00" args: path: scripts/postprocess.py positional_args: - "foo" - "bar" keyword_args: input_dir: /mnt/results/model_29 verbose: true num_workers: 4
- vivarium.cluster_tools.dagger.config.parsing.parse_notebook_step_from_yaml(data, output_directory, *, project, queue)[source]
Parse a raw notebook-step YAML dict into API kwargs.
Required
argskeys:path(input.ipynb) andoutput_path(executed.ipynb). Optionalargskeys:parameters(dict of identifier-keyed scalars injected into the notebook) andcwd(working directory for execution; defaults to the parent ofpath).- Return type:
- Parameters:
Examples
YAML configuration:
steps: - name: post_notebook_neonatal type: notebook resources: memory_gb: 20 runtime: "02:00:00" args: path: tests/model_notebooks/results/neonatal.ipynb output_path: /mnt/results/run_29/executed/neonatal.ipynb parameters: model_dir: /mnt/results/run_29 year: 2020 verbose: true
- vivarium.cluster_tools.dagger.config.parsing.STEP_TYPE_YAML_PARSERS: dict[str, Callable[[...], dict[str, Any]]] = {'bash': <function parse_bash_step_from_yaml>, 'notebook': <function parse_notebook_step_from_yaml>, 'pytest': <function parse_pytest_step_from_yaml>, 'python': <function parse_python_step_from_yaml>, 'simulation': <function parse_simulation_step_from_yaml>}
Maps each YAML
step_typeto its YAML -> API kwargs parser.
- vivarium.cluster_tools.dagger.config.parsing.parse_step_from_yaml(raw, output_directory, *, project, queue)[source]
Build a ParsedStep from a raw YAML step dict.
Dispatches to the matching per-type parser to produce
api_kwargsand tags the result with the resolvedstep_typefor downstream dispatch (task building, YAML serialization).
- vivarium.cluster_tools.dagger.config.parsing.load_workflow_config(path, *, name=None, project=None, queue=None, output_directory=None, default_environment=None, max_attempts=None)[source]
Load a WorkflowConfig from YAML, merging CLI overrides.
CLI arguments take precedence over values in the YAML file. Validates that
name,project,queue, andoutput_directoryare provided by at least one source.- Return type:
- Parameters:
path (Path) – Path to the workflow YAML configuration file.
name (str | None) – CLI override for the workflow name.
project (str | None) – CLI override for the project field.
queue (str | None) – CLI override for the queue field.
output_directory (Path | None) – CLI override for the output directory.
default_environment (str | None) – CLI override for the default_environment field.
max_attempts (int | None) – CLI override for the maximum number of Jobmon task attempts.
- Raises:
ValueError – If
name,project,queue, oroutput_directorycannot be resolved from either the YAML file or CLI arguments.