Config Configuration
Config is responsible for parsing and managing YAML configuration files, converting declarative configurations into executable module sequences.
Configuration Structure
YAML configuration files adopt a hierarchical structure, with each module containing one or more experiment configurations:
ModuleName:
experiment_name:
parameter1: value1
parameter2: value2
Complete Configuration Example
# Data loading
Loader:
load_benchmark:
filepath: benchmark://adult-income
load_custom:
filepath: ./data/custom.csv
# Data preprocessing
Preprocessor:
preprocess:
method: default
# Data splitting
Splitter:
split_train_test:
train_split_ratio: 0.8
num_samples: 3
# Data synthesis
Synthesizer:
sdv_gaussian:
method: sdv
model: GaussianCopula
sdv_ctgan:
method: sdv
model: CTGAN
# Data evaluation
Evaluator:
evaluate_quality:
method: sdmetrics-qualityreport
evaluate_privacy:
method: anonymeter-singlingout
# Results reporting
Reporter:
save_synthetic:
method: save_data
source: Synthesizer
generate_report:
method: save_report
granularity: global
Module Execution Order
Config automatically arranges module execution in the following order:
- Loader - Data loading
- Preprocessor - Data preprocessing (optional)
- Splitter - Data splitting (optional)
- Synthesizer - Data synthesis
- Postprocessor - Data postprocessing (optional)
- Constrainer - Constraints (optional)
- Evaluator - Data evaluation (optional)
- Reporter - Results reporting (optional)
Experiment Naming Rules
Basic Rules
- Experiment names must be unique within the same module
- Can use letters, numbers, underscores, and hyphens
- Cannot end with
_[xxx]
pattern (reserved for system internal use)
Valid Experiment Names
Loader:
load_data: # ✓ Valid
filepath: data.csv
load-benchmark: # ✓ Valid
filepath: benchmark://adult-income
load_custom_v2: # ✓ Valid
filepath: custom.csv
Invalid Experiment Names
Loader:
load_data_[1]: # ✗ Invalid: Reserved pattern
filepath: data.csv
load_[test]: # ✗ Invalid: Reserved pattern
filepath: test.csv
Splitter Special Handling
When Splitter configuration includes num_samples > 1
, Config automatically expands it into multiple experiments:
Original Configuration
Splitter:
split_data:
train_split_ratio: 0.8
num_samples: 3
After Automatic Expansion
Splitter:
split_data_[3-1]:
train_split_ratio: 0.8
num_samples: 1
split_data_[3-2]:
train_split_ratio: 0.8
num_samples: 1
split_data_[3-3]:
train_split_ratio: 0.8
num_samples: 1
This expansion process is automatic; users only need to specify the num_samples
parameter in the configuration.
Multi-Experiment Configuration
Cartesian Product Execution
When multiple modules contain multiple experiments, Config generates cartesian product execution combinations:
Loader:
load_v1:
filepath: data_v1.csv
load_v2:
filepath: data_v2.csv
Synthesizer:
method_a:
method: sdv
model: GaussianCopula
method_b:
method: sdv
model: CTGAN
Execution combinations:
load_v1
→method_a
load_v1
→method_b
load_v2
→method_a
load_v2
→method_b
Result Naming
Results for each execution combination are stored with combination names:
Loader[load_v1]_Synthesizer[method_a]
Loader[load_v1]_Synthesizer[method_b]
Loader[load_v2]_Synthesizer[method_a]
Loader[load_v2]_Synthesizer[method_b]
Configuration Validation
Config performs the following validations during initialization:
1. Structure Validation
- Check if configuration is a valid dictionary structure
- Verify module names are correct
2. Naming Validation
- Check if experiment names use reserved patterns
- Verify experiment names are unique within modules
3. Parameter Validation
- Parameter checking performed by each module’s Adapter
- Validation occurs during Adapter instance creation
Error Handling
ConfigError
When configuration doesn’t meet rules, ConfigError
is raised:
from petsard.exceptions import ConfigError
try:
config = Config(config_dict)
except ConfigError as e:
print(f"Configuration error: {e}")
Common errors:
- Using reserved experiment naming patterns
- Incorrect configuration structure
- Missing required parameters
Notes
- Execution Order: Config automatically arranges correct module execution order; manual specification unnecessary
- Experiment Combinations: Multi-experiment configurations generate cartesian products; note execution time
- Naming Conventions: Follow experiment naming rules to avoid using reserved patterns
- Splitter Expansion:
num_samples
parameter automatically expands; manual configuration of multiple experiments unnecessary - Parameter Validation: Recommended to test configuration with small-scale data first, then run complete workflow after confirming parameters are correct