Synthesizer YAML
The Synthesizer module generates synthetic data and supports various synthesis methods.
Usage Examples
Click the below button to run this example in Colab:
Using Default Method
Loader:
load_benchmark_with_schema:
filepath: benchmark://adult-income
schema: benchmark://adult-income_schema
Synthesizer:
default_synthesis:
method: default
Multiple Experiments
You can define multiple synthesis experiments in the same YAML:
Loader:
load_benchmark_with_schema:
filepath: benchmark://adult-income
schema: benchmark://adult-income_schema
Synthesizer:
default_synthesis:
method: default
custom-method:
method: custom_method
module_path: custom-synthesis.py
class_name: MySynthesizer_Shuffle
For custom_method
configuration, please refer to the “Custom Synthesis Method” documentation.
Main Parameters
- method (
string
)- Synthesis Methods
- When using
method: default
, it automatically uses the SDV GaussianCopula method as the default synthesis method.
Supported Synthesis Methods
This module supports the following four ways to generate or load synthetic data:
petsard-gaussian-copula
: High-performance Gaussian Copula synthesizer using Numba JIT and PyTorch
- Built-in Integration: Quick way to use SDV with default parameters (planned for deprecation)
- SDV Custom Methods: Use
custom_method
to flexibly control all parameters of SDV methods
- Use
custom_method
to integrate your own synthesis algorithms
- Use
- Use
custom_data
to load synthetic data generated by other tools for evaluation
- Use
Execution Flow
- Receive Input: Receives data from Loader or Preprocessor
- Generate Synthetic Data: Generates synthetic data based on specified method
- Maintain Structure: Maintains column structure of original data
- Output Results: Passes synthetic data to subsequent modules