Status API

Status is PETsARD’s internal state management module, responsible for tracking workflow execution, storing results, managing metadata (Schema), and creating execution snapshots.

ℹ️

Internal Use Only: Status is primarily used internally by Executor. Users should access Status functionality through Executor methods.

Class Architecture

Basic Usage

from petsard import Executor

# Status is created and managed internally by Executor
exec = Executor('config.yaml')
exec.run()

# Access Status functionality through Executor methods
results = exec.get_result()        # Status.get_result()
timing = exec.get_timing()         # Status.get_timing_report_data()

# Advanced: Direct Status access
summary = exec.status.get_status_summary()
snapshots = exec.status.get_snapshots()

Constructor

Syntax

Status(config: Config, max_snapshots: int = 1000, max_changes: int = 5000, max_timings: int = 10000)

Parameters

Parameter	Type	Required	Default	Description
`config`	`Config`	Yes	-	Config object containing module sequence and execution configuration
`max_snapshots`	`int`	No	`1000`	Maximum number of snapshots to retain
`max_changes`	`int`	No	`5000`	Maximum number of change records
`max_timings`	`int`	No	`10000`	Maximum number of timing records

Return Value

Returns a Status instance with initialized state management.

Core Functionality

1. Execution Result Tracking

Status records execution results for each module:

# Automatically called by Executor
# status.put(module, experiment, adapter)

# Get results through Executor
results = exec.get_result()

2. Metadata Management

Tracks Schema changes across modules:

# Get Schema for specific module
loader_schema = exec.status.get_metadata('Loader')
print(f"Number of fields: {len(loader_schema.attributes)}")

3. Execution Snapshots

Creates snapshots before and after each module execution:

# Get all snapshots
snapshots = exec.status.get_snapshots()

for snapshot in snapshots:
    print(f"{snapshot.module_name}[{snapshot.experiment_name}]")
    print(f"  Time: {snapshot.timestamp}")

4. Timing Records

Collects execution time information:

# Get timing report
timing_df = exec.get_timing()
print(timing_df)

Main Methods

State Management Methods

Method	Description
`put(module, experiment, adapter)`	Record module execution state
`get_result(module)`	Get module execution result
`get_metadata(module)`	Get module Schema
`get_full_expt(module)`	Get experiment configuration dictionary

Snapshot and Tracking Methods

Method	Description
`get_snapshots(module)`	Get execution snapshots
`get_snapshot_by_id(snapshot_id)`	Get specific snapshot by ID
`get_change_history(module)`	Get change history
`get_metadata_evolution(module)`	Track Schema evolution

Reporting Methods

Method	Description
`get_timing_report_data()`	Get timing report as DataFrame
`get_status_summary()`	Get status summary

Data Classes

ExecutionSnapshot

Immutable record of execution snapshot:

@dataclass(frozen=True)
class ExecutionSnapshot:
    snapshot_id: str
    module_name: str
    experiment_name: str
    timestamp: datetime
    metadata_before: Schema | None = None
    metadata_after: Schema | None = None
    context: dict[str, Any] = field(default_factory=dict)

TimingRecord

Immutable record of timing information:

@dataclass(frozen=True)
class TimingRecord:
    record_id: str
    module_name: str
    experiment_name: str
    step_name: str
    start_time: datetime
    end_time: datetime | None = None
    duration_seconds: float | None = None
    context: dict[str, Any] = field(default_factory=dict)

Integration with Executor

Status is primarily used through Executor:

from petsard import Executor

exec = Executor('config.yaml')
exec.run()

# Access Status functionality through Executor
results = exec.get_result()          # → status.get_result()
timing = exec.get_timing()           # → status.get_timing_report_data()

# Advanced: Direct Status access
summary = exec.status.get_status_summary()
snapshots = exec.status.get_snapshots()

Schema Inference

Status supports Schema inference functionality:

from petsard import Executor

exec = Executor('config.yaml')  # Includes Preprocessor
exec.run()

# Get inferred Schema
inferred_schema = exec.get_inferred_schema('Preprocessor')
if inferred_schema:
    print(f"Inferred Schema: {inferred_schema.id}")

Status Summary

Get complete execution status summary:

summary = exec.status.get_status_summary()

print(f"Module sequence: {summary['sequence']}")
print(f"Active modules: {summary['active_modules']}")
print(f"Total snapshots: {summary['total_snapshots']}")
print(f"Total changes: {summary['total_changes']}")

Notes

Internal Use: Status is primarily used internally by Executor
Recommended Practice: Access Status functionality through Executor methods
Automatic Tracking: Snapshots and changes are automatically recorded during execution
Memory Management: Long-running executions accumulate more snapshots
Immutability: Snapshot and change records are immutable
Advanced Features: Direct Status access requires understanding of internal mechanisms