Synthesizer YAML

Synthesizer YAML

The Synthesizer module generates synthetic data and supports various synthesis methods.

Usage Examples

Click the below button to run this example in Colab:

Open In Colab

Using Default Method

Loader:
  load_benchmark_with_schema:
    filepath: benchmark://adult-income
    schema: benchmark://adult-income_schema
Synthesizer:
  default_synthesis:
    method: default

Multiple Experiments

You can define multiple synthesis experiments in the same YAML:

Loader:
  load_benchmark_with_schema:
    filepath: benchmark://adult-income
    schema: benchmark://adult-income_schema
Synthesizer:
  default_synthesis:
    method: default

  custom-method:
    method: custom_method
    module_path: custom-synthesis.py
    class_name: MySynthesizer_Shuffle

For custom_method configuration, please refer to the “Custom Synthesis Method” documentation.

Main Parameters

  • method (string)
    • Synthesis Methods
    • When using method: default, it automatically uses the SDV GaussianCopula method as the default synthesis method.

Supported Synthesis Methods

This module supports the following four ways to generate or load synthetic data:

  1. PETsARD Built-in Methods

    • petsard-gaussian-copula: High-performance Gaussian Copula synthesizer using Numba JIT and PyTorch
  2. SDV Integration Methods

  3. Custom Synthesis Methods

    • Use custom_method to integrate your own synthesis algorithms
  4. External Data Loading

    • Use custom_data to load synthetic data generated by other tools for evaluation

Execution Flow

  1. Receive Input: Receives data from Loader or Preprocessor
  2. Generate Synthetic Data: Generates synthetic data based on specified method
  3. Maintain Structure: Maintains column structure of original data
  4. Output Results: Passes synthetic data to subsequent modules