Evaluator YAML

Evaluator YAML

YAML configuration file format for the Evaluator module.

Usage Examples

Click the below button to run this example in Colab:

Open In Colab

Recommended Evaluation Workflow

We recommend the following evaluation workflow to ensure synthetic data meets requirements:

1. Evaluation Workflow Overview

flowchart TD
    Start([Start Evaluation])
    Diagnostic{Step 1:<br/>Diagnostics Pass?}
    DiagnosticFail[Data Structure Issues<br/>Check Synthesis Process]
    Privacy{Step 2:<br/>Protection Pass?}
    PrivacyFail[Privacy Risk Too High<br/>Adjust Synthesis Parameters]
    Purpose{Step 3:<br/>Synthetic Data Purpose?}
    Release[Scenario A:<br/>Data Release<br/>No Specific Downstream Task]
    Task[Scenario B:<br/>Specific Task Application<br/>Data Augmentation/Model Training]
    FidelityFocus[Evaluation Focus:<br/>Pursue Highest Fidelity]
    UtilityFocus[Evaluation Focus:<br/>Pursue High Utility<br/>Fidelity Meets Baseline]

    Start --> Diagnostic
    Diagnostic -->|No| DiagnosticFail
    Diagnostic -->|Yes| Privacy
    Privacy -->|No| PrivacyFail
    Privacy -->|Yes| Purpose
    Purpose -->|A| Release
    Purpose -->|B| Task
    Release --> FidelityFocus
    Task --> UtilityFocus

    style Start fill:#e1f5fe
    style DiagnosticFail fill:#ffcdd2
    style PrivacyFail fill:#ffcdd2
    style FidelityFocus fill:#c8e6c9
    style UtilityFocus fill:#c8e6c9

2. Data Diagnostics Standard

flowchart TD
    Start([Data Diagnostics Assessment])
    Method[Assessment Method:<br/>sdmetrics-diagnosticreport]
    Check{Score ≈ 1.0?}
    Pass[Diagnostics Pass]
    Fail[Check Synthesis Process]

    Start --> Method
    Method --> Check
    Check -->|Yes| Pass
    Check -->|No| Fail

    style Start fill:#e1f5fe
    style Pass fill:#c8e6c9
    style Fail fill:#ffcdd2
    style Method fill:#fff3e0

3. Privacy Protection Standard

flowchart TD
    Start([Privacy Protection Assessment])
    Method[Assessment Methods:<br/>anonymeter-singlingout<br/>anonymeter-linkability<br/>anonymeter-inference]
    Check{Risk < 0.09?}
    Pass[Protection Pass]
    Fail[Adjust Synthesis Parameters]

    Start --> Method
    Method --> Check
    Check -->|Yes| Pass
    Check -->|No| Fail

    style Start fill:#e1f5fe
    style Pass fill:#c8e6c9
    style Fail fill:#ffcdd2
    style Method fill:#fff3e0

4. Data Fidelity Standard

flowchart TD
    Start([Data Fidelity Assessment])
    Method[Assessment Method:<br/>sdmetrics-qualityreport]
    Check{Score ≥ 0.75?}
    Pass[Fidelity Met]
    Fail[Need to Improve Fidelity]

    Start --> Method
    Method --> Check
    Check -->|Yes| Pass
    Check -->|No| Fail

    style Start fill:#e1f5fe
    style Pass fill:#c8e6c9
    style Fail fill:#fff3e0
    style Method fill:#fff3e0

5. Data Utility Standard

flowchart TD
    Start([Data Utility Assessment])
    Method[Assessment Method:<br/>mlutility]
    TaskType{Task Type & Standard}
    Classification{Classification XGBoost:<br/>F1 ≥ 0.7?}
    Regression{Regression XGBoost:<br/>R² ≥ 0.7?}
    Clustering{Clustering K-means:<br/>Silhouette ≥ 0.5?}
    Pass[Utility Met]
    Fail[Need to Improve Model Utility]

    Start --> Method
    Method --> TaskType
    TaskType -->|Classification| Classification
    TaskType -->|Regression| Regression
    TaskType -->|Clustering| Clustering
    Classification -->|Yes| Pass
    Classification -->|No| Fail
    Regression -->|Yes| Pass
    Regression -->|No| Fail
    Clustering -->|Yes| Pass
    Clustering -->|No| Fail

    style Start fill:#e1f5fe
    style Pass fill:#c8e6c9
    style Fail fill:#fff3e0
    style Method fill:#fff3e0

Legend:

  • Light blue boxes: Start/End points
  • White boxes: Evaluation steps
  • Diamond shapes: Decision points
  • Green boxes: Success outcomes
  • Red boxes: Failure states requiring action
  • Yellow boxes: Improvement needed
  • Orange boxes: Assessment methods

1. Foundation Evaluation (Required)

First, confirm data validity and privacy protection:

2. Goal-Oriented Evaluation

After passing foundation evaluation, choose evaluation focus based on the intended use of synthetic data:

Scenario A: Data Release (No Specific Downstream Task)

If synthetic data will be released publicly, pursue highest fidelity:

Splitter:
  external_split:
    method: custom_data
    filepath:
      ori: benchmark://adult-income_ori
      control: benchmark://adult-income_control
    schema:
      ori: benchmark://adult-income_schema
      control: benchmark://adult-income_schema
Synthesizer:
  external_data:
    method: custom_data
    filepath: benchmark://adult-income_syn
    schema: benchmark://adult-income_schema
Evaluator:
  # Step 1: Data validity diagnosis (should be close to 1.0)
  validity_check:
    method: sdmetrics-diagnosticreport
  # Step 2: Privacy protection assessment (risk should be < 0.09)
  singling_out_risk:
    method: anonymeter-singlingout
    n_attacks: 400
    max_attempts: 4000
  linkability_risk:
    method: anonymeter-linkability
    aux_cols:
      -
        - workclass
        - education
        - occupation
        - race
        - gender
      -
        - age
        - marital-status
        - relationship
        - native-country
        - income
  inference_risk:
    method: anonymeter-inference
    secret: income
  # Focus: Pursue high fidelity (higher score is better)
  fidelity_assessment:
    method: sdmetrics-qualityreport
  # Utility evaluation is optional (not necessary)

Scenario B: Specific Task Application (Data Augmentation, Model Training, etc.)

If synthetic data is for specific machine learning tasks, pursue high utility:

Splitter:
  external_split:
    method: custom_data
    filepath:
      ori: benchmark://adult-income_ori
      control: benchmark://adult-income_control
    schema:
      ori: benchmark://adult-income_schema
      control: benchmark://adult-income_schema
Synthesizer:
  external_data:
    method: custom_data
    filepath: benchmark://adult-income_syn
    schema: benchmark://adult-income_schema
Evaluator:
  # Step 1: Data validity diagnosis (should be close to 1.0)
  validity_check:
    method: sdmetrics-diagnosticreport
  # Step 2: Privacy protection assessment (risk should be < 0.09)
  singling_out_risk:
    method: anonymeter-singlingout
    n_attacks: 400
    max_attempts: 4000
  linkability_risk:
    method: anonymeter-linkability
    aux_cols:
      -
        - workclass
        - education
        - occupation
        - race
        - gender
      -
        - age
        - marital-status
        - relationship
        - native-country
        - income
  inference_risk:
    method: anonymeter-inference
    secret: income
  # Fidelity just needs to meet threshold (≥ 0.75)
  fidelity_assessment:
    method: sdmetrics-qualityreport
  # Focus: Pursue high utility (evaluate by task type)
  ml_utility_assessment:
    method: mlutility
    task_type: classification  # or regression/clustering
    target: income

Main Parameters

  • method (string, required)
    • Evaluation method name
    • See supported methods in the table below

Supported Evaluation Methods

TypeMethod NameDescriptionRecommended Standard
Default MethoddefaultDefault evaluation (equivalent to sdmetrics-qualityreport)Score ≥ 0.75¹
Data Validitysdmetrics-diagnosticreportCheck data structure and basic characteristicsScore ≈ 1.0²
Privacy Protectionanonymeter-singlingoutSingling out risk assessmentRisk < 0.09³
anonymeter-linkabilityLinkability risk assessmentRisk < 0.09³
anonymeter-inferenceInference risk assessmentRisk < 0.09³
Data Fidelitysdmetrics-qualityreportStatistical distribution similarity assessmentScore ≥ 0.75¹
Data UtilitymlutilityMachine learning model utilityTask-dependent⁴
Custom Assessmentcustom_methodCustom evaluation method-

Recommended Standards Notes

¹ Fidelity Standard (Score ≥ 0.75): Based on statistical distribution similarity

² Validity Standard (Score ≈ 1.0): Data structure integrity check

³ Privacy Risk Standard (Risk < 0.09): Based on PDPC Singapore guidelines

Utility Standard (Task-dependent):

  • Classification tasks (XGBoost): F1 ≥ 0.7
  • Regression tasks (XGBoost): R² ≥ 0.7
  • Clustering tasks (K-means): Silhouette coefficient ≥ 0.5

Default Method: When method: default is used, the system automatically executes sdmetrics-qualityreport to evaluate data fidelity.

Threshold Adjustment: The above recommended standards are general reference values. Please adjust appropriate thresholds based on your specific use case and risk tolerance. For detailed theoretical foundations and references for each metric, please refer to the corresponding subdocumentation.

Execution Notes

  • Evaluation of large datasets may require significant time, especially for Anonymeter methods
  • Recommend executing evaluations sequentially, ensuring prerequisites are met before proceeding