Synthesizer YAML

The Synthesizer module generates synthetic data and supports various synthesis methods.

Usage Examples

Click the below button to run this example in Colab:

Note: If using Colab, please see the runtime setup guide.

Using Default Method

Loader:
  load_benchmark_with_schema:
    filepath: benchmark://adult-income
    schema: benchmark://adult-income_schema
Synthesizer:
  default_synthesis:
    method: default

Multiple Experiments

You can define multiple synthesis experiments in the same YAML:

Loader:
  load_benchmark_with_schema:
    filepath: benchmark://adult-income
    schema: benchmark://adult-income_schema
Synthesizer:
  default_synthesis:
    method: default

  custom-method:
    method: custom_method
    module_path: custom-synthesis.py
    class_name: MySynthesizer_Shuffle

For custom_method configuration, please refer to the “Custom Synthesis Method” documentation.

Main Parameters

method (string)
- Synthesis Methods
- When using method: default, it automatically uses the PETsARD built-in Gaussian Copula method as the default synthesis method.

Supported Synthesis Methods

This module supports the following four ways to generate or load synthetic data:

PETsARD Built-in Methods
- petsard-gaussian-copula: High-performance Gaussian Copula synthesizer using Numba JIT and PyTorch
SDV Integration Methods (Optional Feature)
- Built-in Integration: Quick way to use SDV with default parameters (requires separate installation, for reference only)
- SDV Custom Methods: Use custom_method to flexibly control all parameters of SDV methods (requires separate installation, for reference only)
Custom Synthesis Methods
- Use custom_method to integrate your own synthesis algorithms
External Data Loading
- Use custom_data to load synthetic data generated by other tools for evaluation

Execution Flow

Receive Input: Receives data from Loader or Preprocessor
Generate Synthetic Data: Generates synthetic data based on specified method
Maintain Structure: Maintains column structure of original data
Output Results: Passes synthetic data to subsequent modules

Preprocessor YAML Postprocessor YAML