Evaluator API

Evaluator API

Synthetic data quality evaluation module, providing privacy risk measurement, data quality assessment, and machine learning utility analysis.

Class Architecture

classDiagram
    class Evaluator {
        EvaluatorConfig config
        string method
        __init__(method, **kwargs)
        create()
        eval(data) EvalResult
    }

    class EvaluatorConfig {
        string method
        dict params
        string module_path
        string class_name
    }

    class EvalResult {
        float global
        dict details
        DataFrame report
    }

    %% Privacy Risk Evaluators
    class Anonymeter {
        int n_attacks
        int n_cols
        eval() EvalResult
    }

    class SinglingOutEvaluator {
        eval() EvalResult
    }

    class LinkabilityEvaluator {
        list aux_cols
        eval() EvalResult
    }

    class InferenceEvaluator {
        string secret
        list aux_cols
        eval() EvalResult
    }

    %% Data Quality Evaluators
    class SDMetrics {
        string report_type
        eval() EvalResult
    }

    class DiagnosticReport {
        eval() EvalResult
    }

    class QualityReport {
        eval() EvalResult
    }

    %% ML Utility Evaluators
    class MLUtility {
        string task_type
        string target
        string experiment_design
        string resampling
        eval() EvalResult
    }

    class ClassificationUtility {
        list metrics
        eval() EvalResult
    }

    class RegressionUtility {
        list metrics
        eval() EvalResult
    }

    class ClusteringUtility {
        int n_clusters
        eval() EvalResult
    }

    %% Statistical Evaluator
    class StatsEvaluator {
        list stats_method
        string compare_method
        eval() EvalResult
    }

    %% Custom Evaluator
    class CustomEvaluator {
        string module_path
        string class_name
        eval() EvalResult
    }

    %% Input Data
    class InputData {
        DataFrame ori
        DataFrame syn
        DataFrame control
    }

    %% Relationships
    Evaluator *-- EvaluatorConfig
    Evaluator ..> EvalResult
    
    %% Inheritance for Privacy
    Anonymeter <|-- SinglingOutEvaluator
    Anonymeter <|-- LinkabilityEvaluator
    Anonymeter <|-- InferenceEvaluator
    
    %% Inheritance for Quality
    SDMetrics <|-- DiagnosticReport
    SDMetrics <|-- QualityReport
    
    %% Inheritance for ML Utility
    MLUtility <|-- ClassificationUtility
    MLUtility <|-- RegressionUtility
    MLUtility <|-- ClusteringUtility
    
    %% Dependencies
    Evaluator ..> Anonymeter
    Evaluator ..> SDMetrics
    Evaluator ..> MLUtility
    Evaluator ..> StatsEvaluator
    Evaluator ..> CustomEvaluator
    
    %% Data flow
    InputData ..> Evaluator

    %% Styling
    style Evaluator fill:#e6f3ff,stroke:#4a90e2,stroke-width:3px
    style EvaluatorConfig fill:#f3e6ff,stroke:#9966cc,stroke-width:2px
    style EvalResult fill:#f3e6ff,stroke:#9966cc,stroke-width:2px
    
    style Anonymeter fill:#fff2e6,stroke:#ff9800,stroke-width:2px
    style SinglingOutEvaluator fill:#fff2e6,stroke:#ff9800,stroke-width:2px
    style LinkabilityEvaluator fill:#fff2e6,stroke:#ff9800,stroke-width:2px
    style InferenceEvaluator fill:#fff2e6,stroke:#ff9800,stroke-width:2px
    
    style SDMetrics fill:#fff2e6,stroke:#ff9800,stroke-width:2px
    style DiagnosticReport fill:#fff2e6,stroke:#ff9800,stroke-width:2px
    style QualityReport fill:#fff2e6,stroke:#ff9800,stroke-width:2px
    
    style MLUtility fill:#fff2e6,stroke:#ff9800,stroke-width:2px
    style ClassificationUtility fill:#fff2e6,stroke:#ff9800,stroke-width:2px
    style RegressionUtility fill:#fff2e6,stroke:#ff9800,stroke-width:2px
    style ClusteringUtility fill:#fff2e6,stroke:#ff9800,stroke-width:2px
    
    style StatsEvaluator fill:#fff2e6,stroke:#ff9800,stroke-width:2px
    style CustomEvaluator fill:#fff2e6,stroke:#ff9800,stroke-width:2px
    
    style InputData fill:#e6ffe6,stroke:#66cc66,stroke-width:2px

Legend:

  • Blue boxes: Main classes
  • Orange boxes: Subclass implementations
  • Light purple boxes: Configuration and data classes
  • Light green boxes: Input data
  • <|--: Inheritance relationship
  • *--: Composition relationship
  • ..>: Dependency relationship
  • -->: Data flow

Basic Usage

from petsard import Evaluator

# Privacy risk assessment
evaluator = Evaluator('anonymeter-singlingout')
evaluator.create()
eval_result = evaluator.eval({
    'ori': train_data,
    'syn': synthetic_data,
    'control': test_data
})
privacy_risk = eval_result['global']

# Data quality assessment
evaluator = Evaluator('sdmetrics-qualityreport')
evaluator.create()
eval_result = evaluator.eval({
    'ori': train_data,
    'syn': synthetic_data
})
quality_score = eval_result['global']

# Machine learning utility assessment (new version)
evaluator = Evaluator('mlutility', task_type='classification', target='income')
evaluator.create()
eval_result = evaluator.eval({
    'ori': train_data,
    'syn': synthetic_data,
    'control': test_data
})
ml_utility = eval_result['global']

Constructor (init)

Initialize evaluator instance.

Syntax

def __init__(
    method: str,
    **kwargs
)

Parameters

  • method : str, required

    • Evaluation method name
    • Required parameter
    • Supported methods:
      • Privacy Risk Assessment:
        • 'anonymeter-singlingout': Singling out risk
        • 'anonymeter-linkability': Linkability risk
        • 'anonymeter-inference': Inference risk
      • Data Quality Assessment:
        • 'sdmetrics-diagnosticreport': Data diagnostic report
        • 'sdmetrics-qualityreport': Data quality report
      • Machine Learning Utility Assessment (Legacy):
        • 'mlutility-classification': Classification utility (multiple models)
        • 'mlutility-regression': Regression utility (multiple models)
        • 'mlutility-cluster': Clustering utility (K-means)
      • Machine Learning Utility Assessment (New, Recommended):
        • 'mlutility': Unified interface (requires task_type parameter)
      • Statistical Assessment:
        • 'stats': Statistical difference comparison
      • Default Method:
        • 'default': Uses sdmetrics-qualityreport
      • Custom Method:
        • 'custom_method': Custom evaluator
  • kwargs : dict, optional

    • Additional parameters for specific evaluators
    • May include depending on evaluation method:
      • MLUtility Parameters:
        • task_type: Task type (‘classification’, ‘regression’, ‘clustering’)
        • target: Target column name
        • experiment_design: Experiment design approach
        • resampling: Imbalanced data handling method
      • Anonymeter Parameters:
        • n_attacks: Number of attack attempts
        • n_cols: Number of columns per query
        • secret: Column to be inferred (inference risk)
        • aux_cols: Auxiliary information columns (linkability risk)
      • Custom Method Parameters:
        • module_path: Custom module path
        • class_name: Custom class name

Return Value

  • Evaluator
    • Initialized evaluator instance

Usage Examples

from petsard import Evaluator

# Default evaluation
evaluator = Evaluator('default')
evaluator.create()
eval_result = evaluator.eval({
    'ori': original_data,
    'syn': synthetic_data
})

Supported Evaluation Types

Please refer to PETsARD YAML documentation for details.

Notes

  • Method Selection: Choose evaluation method suitable for your needs, different methods focus on different aspects
  • Data Requirements: Different evaluation methods require different input data combinations
    • Anonymeter and MLUtility: Require ori, syn, control three datasets
    • SDMetrics and Stats: Only require ori and syn two datasets
  • Best Practice: Use YAML configuration files rather than direct Python API
  • Method Call Order: Must call create() before calling eval()
  • MLUtility Version: Recommend using new MLUtility (with task_type) rather than legacy separate interfaces
  • Documentation Note: This documentation is for internal development team reference only, backward compatibility is not guaranteed