Workflow Step Parsing

YAML -> API kwargs translation for workflow steps. Each step type has a parser that turns the raw YAML dict into the kwargs of its matching interface API function. Parsers do YAML-shape validation (required fields, unsupported args keys, command/type conflicts) inline.

Also exposes workflow-level entry points: parse_step_from_yaml() and load_workflow_config(). ParsedStep -> YAML dict serialization lives in vivarium.cluster_tools.dagger.config.serialization.

vivarium.cluster_tools.dagger.config.parsing.resolve_step_type(step_dict)[source]

Pick the step-type key for step_dict.

Dispatch rules:

  • A top-level command field always resolves to "bash"; parse_bash_step_from_yaml() enforces the rest of the bash-step schema (including any conflicting type).

  • Otherwise, an explicit type is used.

  • A step with neither command nor type is rejected.

Return type:

str

Parameters:

step_dict (dict[str, Any])

vivarium.cluster_tools.dagger.config.parsing.parse_bash_step_from_yaml(data, output_directory, *, project, queue)[source]

Parse a raw bash-step YAML dict into API kwargs.

The YAML form for a bash step requires a top-level command field. The optional type field, when present, must be "bash". No args: block is accepted.

Return type:

dict[str, Any]

Parameters:

Examples

YAML configuration:

steps:
  - name: post_analysis
    command: python scripts/analyze.py --input /results
    environment: analysis_env
    resources:
      memory_gb: 20
      runtime: "02:00:00"
      cores: 2
vivarium.cluster_tools.dagger.config.parsing.parse_simulation_step_from_yaml(data, output_directory, *, project, queue)[source]

Parse a raw simulation-step YAML dict into API kwargs.

Return type:

dict[str, Any]

Parameters:

Examples

YAML configuration:

steps:
  - name: model_sims
    type: simulation
    resources:
      memory_gb: 3
      runtime: "24:00:00"
    args:
      model_specification: /path/to/model.yaml
      branch_configuration: /path/to/branches.yaml
      artifact_path: /path/to/artifact.hdf
      backup_freq: 1800
      sim_verbosity: 1
vivarium.cluster_tools.dagger.config.parsing.parse_pytest_step_from_yaml(data, output_directory, *, project, queue)[source]

Parse a raw pytest-step YAML dict into API kwargs.

Optional args keys: path, k, runslow. At least one of path or k must be provided. path may be a single string or a list of strings.

Return type:

dict[str, Any]

Parameters:

Examples

YAML configuration:

steps:
  - name: unit_tests
    type: pytest
    resources:
      memory_gb: 8
      runtime: "01:00:00"
      cores: 4
    args:
      path: tests/
      k: "test_foo"
      runslow: true

Multiple paths:

steps:
  - name: unit_and_integration
    type: pytest
    resources:
      memory_gb: 8
      runtime: "01:00:00"
    args:
      path:
        - tests/unit
        - tests/integration
vivarium.cluster_tools.dagger.config.parsing.parse_python_step_from_yaml(data, output_directory, *, project, queue)[source]

Parse a raw python-step YAML dict into API kwargs.

Required args key: path (a .py script). Optional args keys: positional_args (list of scalars) and keyword_args (dict of identifier-keyed scalars).

Return type:

dict[str, Any]

Parameters:

Examples

YAML configuration:

steps:
  - name: postprocess
    type: python
    resources:
      memory_gb: 8
      runtime: "00:30:00"
    args:
      path: scripts/postprocess.py
      positional_args:
        - "foo"
        - "bar"
      keyword_args:
        input_dir: /mnt/results/model_29
        verbose: true
        num_workers: 4
vivarium.cluster_tools.dagger.config.parsing.parse_notebook_step_from_yaml(data, output_directory, *, project, queue)[source]

Parse a raw notebook-step YAML dict into API kwargs.

Required args keys: path (input .ipynb) and output_path (executed .ipynb). Optional args keys: parameters (dict of identifier-keyed scalars injected into the notebook) and cwd (working directory for execution; defaults to the parent of path).

Return type:

dict[str, Any]

Parameters:

Examples

YAML configuration:

steps:
  - name: post_notebook_neonatal
    type: notebook
    resources:
      memory_gb: 20
      runtime: "02:00:00"
    args:
      path: tests/model_notebooks/results/neonatal.ipynb
      output_path: /mnt/results/run_29/executed/neonatal.ipynb
      parameters:
        model_dir: /mnt/results/run_29
        year: 2020
        verbose: true
vivarium.cluster_tools.dagger.config.parsing.STEP_TYPE_YAML_PARSERS: dict[str, Callable[[...], dict[str, Any]]] = {'bash': <function parse_bash_step_from_yaml>, 'notebook': <function parse_notebook_step_from_yaml>, 'pytest': <function parse_pytest_step_from_yaml>, 'python': <function parse_python_step_from_yaml>, 'simulation': <function parse_simulation_step_from_yaml>}

Maps each YAML step_type to its YAML -> API kwargs parser.

vivarium.cluster_tools.dagger.config.parsing.parse_step_from_yaml(raw, output_directory, *, project, queue)[source]

Build a ParsedStep from a raw YAML step dict.

Dispatches to the matching per-type parser to produce api_kwargs and tags the result with the resolved step_type for downstream dispatch (task building, YAML serialization).

Return type:

ParsedStep

Parameters:
vivarium.cluster_tools.dagger.config.parsing.load_workflow_config(path, *, name=None, project=None, queue=None, output_directory=None, default_environment=None, max_attempts=None)[source]

Load a WorkflowConfig from YAML, merging CLI overrides.

CLI arguments take precedence over values in the YAML file. Validates that name, project, queue, and output_directory are provided by at least one source.

Return type:

WorkflowConfig

Parameters:
  • path (Path) – Path to the workflow YAML configuration file.

  • name (str | None) – CLI override for the workflow name.

  • project (str | None) – CLI override for the project field.

  • queue (str | None) – CLI override for the queue field.

  • output_directory (Path | None) – CLI override for the output directory.

  • default_environment (str | None) – CLI override for the default_environment field.

  • max_attempts (int | None) – CLI override for the maximum number of Jobmon task attempts.

Raises:

ValueError – If name, project, queue, or output_directory cannot be resolved from either the YAML file or CLI arguments.