Evaluator YAML
YAML configuration file format for the Evaluator module.
Usage Examples
Click the below button to run this example in Colab:
Recommended Evaluation Workflow
We recommend the following evaluation workflow to ensure synthetic data meets requirements:
1. Evaluation Workflow Overview
flowchart TD Start([Start Evaluation]) Diagnostic{Step 1:<br/>Diagnostics Pass?} DiagnosticFail[Data Structure Issues<br/>Check Synthesis Process] Privacy{Step 2:<br/>Protection Pass?} PrivacyFail[Privacy Risk Too High<br/>Adjust Synthesis Parameters] Purpose{Step 3:<br/>Synthetic Data Purpose?} Release[Scenario A:<br/>Data Release<br/>No Specific Downstream Task] Task[Scenario B:<br/>Specific Task Application<br/>Data Augmentation/Model Training] FidelityFocus[Evaluation Focus:<br/>Pursue Highest Fidelity] UtilityFocus[Evaluation Focus:<br/>Pursue High Utility<br/>Fidelity Meets Baseline] Start --> Diagnostic Diagnostic -->|No| DiagnosticFail Diagnostic -->|Yes| Privacy Privacy -->|No| PrivacyFail Privacy -->|Yes| Purpose Purpose -->|A| Release Purpose -->|B| Task Release --> FidelityFocus Task --> UtilityFocus style Start fill:#e1f5fe style DiagnosticFail fill:#ffcdd2 style PrivacyFail fill:#ffcdd2 style FidelityFocus fill:#c8e6c9 style UtilityFocus fill:#c8e6c9
2. Data Diagnostics Standard
flowchart TD Start([Data Diagnostics Assessment]) Method[Assessment Method:<br/>sdmetrics-diagnosticreport] Check{Score ≈ 1.0?} Pass[Diagnostics Pass] Fail[Check Synthesis Process] Start --> Method Method --> Check Check -->|Yes| Pass Check -->|No| Fail style Start fill:#e1f5fe style Pass fill:#c8e6c9 style Fail fill:#ffcdd2 style Method fill:#fff3e0
3. Privacy Protection Standard
flowchart TD Start([Privacy Protection Assessment]) Method[Assessment Methods:<br/>anonymeter-singlingout<br/>anonymeter-linkability<br/>anonymeter-inference] Check{Risk < 0.09?} Pass[Protection Pass] Fail[Adjust Synthesis Parameters] Start --> Method Method --> Check Check -->|Yes| Pass Check -->|No| Fail style Start fill:#e1f5fe style Pass fill:#c8e6c9 style Fail fill:#ffcdd2 style Method fill:#fff3e0
4. Data Fidelity Standard
flowchart TD Start([Data Fidelity Assessment]) Method[Assessment Method:<br/>sdmetrics-qualityreport] Check{Score ≥ 0.75?} Pass[Fidelity Met] Fail[Need to Improve Fidelity] Start --> Method Method --> Check Check -->|Yes| Pass Check -->|No| Fail style Start fill:#e1f5fe style Pass fill:#c8e6c9 style Fail fill:#fff3e0 style Method fill:#fff3e0
5. Data Utility Standard
flowchart TD Start([Data Utility Assessment]) Method[Assessment Method:<br/>mlutility] TaskType{Task Type & Standard} Classification{Classification XGBoost:<br/>F1 ≥ 0.7?} Regression{Regression XGBoost:<br/>R² ≥ 0.7?} Clustering{Clustering K-means:<br/>Silhouette ≥ 0.5?} Pass[Utility Met] Fail[Need to Improve Model Utility] Start --> Method Method --> TaskType TaskType -->|Classification| Classification TaskType -->|Regression| Regression TaskType -->|Clustering| Clustering Classification -->|Yes| Pass Classification -->|No| Fail Regression -->|Yes| Pass Regression -->|No| Fail Clustering -->|Yes| Pass Clustering -->|No| Fail style Start fill:#e1f5fe style Pass fill:#c8e6c9 style Fail fill:#fff3e0 style Method fill:#fff3e0
Legend:
- Light blue boxes: Start/End points
- White boxes: Evaluation steps
- Diamond shapes: Decision points
- Green boxes: Success outcomes
- Red boxes: Failure states requiring action
- Yellow boxes: Improvement needed
- Orange boxes: Assessment methods
1. Foundation Evaluation (Required)
First, confirm data validity and privacy protection:
2. Goal-Oriented Evaluation
After passing foundation evaluation, choose evaluation focus based on the intended use of synthetic data:
Scenario A: Data Release (No Specific Downstream Task)
If synthetic data will be released publicly, pursue highest fidelity:
Splitter:
external_split:
method: custom_data
filepath:
ori: benchmark://adult-income_ori
control: benchmark://adult-income_control
schema:
ori: benchmark://adult-income_schema
control: benchmark://adult-income_schema
Synthesizer:
external_data:
method: custom_data
filepath: benchmark://adult-income_syn
schema: benchmark://adult-income_schema
Evaluator:
# Step 1: Data validity diagnosis (should be close to 1.0)
validity_check:
method: sdmetrics-diagnosticreport
# Step 2: Privacy protection assessment (risk should be < 0.09)
singling_out_risk:
method: anonymeter-singlingout
n_attacks: 400
max_attempts: 4000
linkability_risk:
method: anonymeter-linkability
aux_cols:
-
- workclass
- education
- occupation
- race
- gender
-
- age
- marital-status
- relationship
- native-country
- income
inference_risk:
method: anonymeter-inference
secret: income
# Focus: Pursue high fidelity (higher score is better)
fidelity_assessment:
method: sdmetrics-qualityreport
# Utility evaluation is optional (not necessary)
Scenario B: Specific Task Application (Data Augmentation, Model Training, etc.)
If synthetic data is for specific machine learning tasks, pursue high utility:
Splitter:
external_split:
method: custom_data
filepath:
ori: benchmark://adult-income_ori
control: benchmark://adult-income_control
schema:
ori: benchmark://adult-income_schema
control: benchmark://adult-income_schema
Synthesizer:
external_data:
method: custom_data
filepath: benchmark://adult-income_syn
schema: benchmark://adult-income_schema
Evaluator:
# Step 1: Data validity diagnosis (should be close to 1.0)
validity_check:
method: sdmetrics-diagnosticreport
# Step 2: Privacy protection assessment (risk should be < 0.09)
singling_out_risk:
method: anonymeter-singlingout
n_attacks: 400
max_attempts: 4000
linkability_risk:
method: anonymeter-linkability
aux_cols:
-
- workclass
- education
- occupation
- race
- gender
-
- age
- marital-status
- relationship
- native-country
- income
inference_risk:
method: anonymeter-inference
secret: income
# Fidelity just needs to meet threshold (≥ 0.75)
fidelity_assessment:
method: sdmetrics-qualityreport
# Focus: Pursue high utility (evaluate by task type)
ml_utility_assessment:
method: mlutility
task_type: classification # or regression/clustering
target: income
Main Parameters
- method (
string
, required)- Evaluation method name
- See supported methods in the table below
Supported Evaluation Methods
Type | Method Name | Description | Recommended Standard |
---|---|---|---|
Default Method | default | Default evaluation (equivalent to sdmetrics-qualityreport ) | Score ≥ 0.75¹ |
Data Validity | sdmetrics-diagnosticreport | Check data structure and basic characteristics | Score ≈ 1.0² |
Privacy Protection | anonymeter-singlingout | Singling out risk assessment | Risk < 0.09³ |
anonymeter-linkability | Linkability risk assessment | Risk < 0.09³ | |
anonymeter-inference | Inference risk assessment | Risk < 0.09³ | |
Data Fidelity | sdmetrics-qualityreport | Statistical distribution similarity assessment | Score ≥ 0.75¹ |
Data Utility | mlutility | Machine learning model utility | Task-dependent⁴ |
Custom Assessment | custom_method | Custom evaluation method | - |
Recommended Standards Notes
¹ Fidelity Standard (Score ≥ 0.75): Based on statistical distribution similarity
² Validity Standard (Score ≈ 1.0): Data structure integrity check
³ Privacy Risk Standard (Risk < 0.09): Based on PDPC Singapore guidelines
⁴ Utility Standard (Task-dependent):
- Classification tasks (XGBoost): F1 ≥ 0.7
- Regression tasks (XGBoost): R² ≥ 0.7
- Clustering tasks (K-means): Silhouette coefficient ≥ 0.5
Default Method: When
method: default
is used, the system automatically executessdmetrics-qualityreport
to evaluate data fidelity.
Threshold Adjustment: The above recommended standards are general reference values. Please adjust appropriate thresholds based on your specific use case and risk tolerance. For detailed theoretical foundations and references for each metric, please refer to the corresponding subdocumentation.
Execution Notes
- Evaluation of large datasets may require significant time, especially for Anonymeter methods
- Recommend executing evaluations sequentially, ensuring prerequisites are met before proceeding