Results Writing

Simple per-task result writing. The workflow script serializes metadata JSON files for the worker to pick up. Each worker writes one parquet file per metric directly to the results directory.

Directory structure:

results/
    metadata/
        {task_id}.json
    {metric_name}/
        {task_id}.parquet

Reading all results for a metric is simply pd.read_parquet(results_dir / metric_name), which automatically combines all parquet files in the directory.

Task completion is determined by the existence of result parquet files. Metadata for completed tasks is read from the metadata JSON files in the metadata directory.

vivarium_cluster_tools.psimulate.results.writing.write_metadata(metadata_dir, job_parameters)[source]

Write a metadata JSON file for a single task.

The metadata file serializes the job parameters for the workhorse script to pick up, and also serves as the reference for restart and expand metadata.

Return type:

None

Parameters:

metadata_dir (Path) – Directory to write the metadata file.
job_parameters (JobParameters) – The job parameters for this task.

vivarium_cluster_tools.psimulate.results.writing.write_task_results(results_dir, job_parameters, results_dict)[source]

Write a single task’s results directly to the results directory.

Return type:

None

Parameters:

results_dir (Path) – The results directory (e.g., output_root/results).
job_parameters (JobParameters) – The job parameters for this task.
results_dict (dict[str, DataFrame]) – Dictionary mapping metric names to results DataFrames.

vivarium_cluster_tools.psimulate.results.writing.get_completed_task_ids(results_dir)[source]

Get task IDs that have result parquet files.

Scans all subdirectories of results_dir for .parquet files and extracts the task IDs from their filenames (stems).

Return type:: set[str]
Parameters:: results_dir (Path) – The results directory.
Returns:: Set of task IDs with at least one result parquet file.

vivarium_cluster_tools.psimulate.results.writing.collect_metadata(metadata_dir, results_dir)[source]

Collect metadata for completed tasks.

Determines which tasks completed by scanning for result parquet files in results_dir, then reads the corresponding metadata JSON files from metadata_dir to build the metadata DataFrame.

Return type:

DataFrame

Parameters:

metadata_dir (Path) – The directory containing pre-written metadata JSON files (one per task, written by the workflow builder).
results_dir (Path) – The results directory containing metric subdirectories with parquet files.

Returns:

Combined metadata DataFrame with flattened job-specific parameters, or an empty DataFrame if no completed tasks exist.