Synthesizer API

Synthetic data generation module supporting multiple synthesis methods and providing data generation capabilities.

Class Architecture

classDiagram
    class Synthesizer {
        +config: dict
        +data_syn: pd.DataFrame
        +synthesizer: object
        +_impl: object
        +__init__(method: str, **kwargs)
        +create(metadata: SchemaMetadata)
        +fit(data: pd.DataFrame)
        +sample(sample_num_rows: int, reset_sampling: bool, output_file_path: str)
        +fit_sample(data: pd.DataFrame, **kwargs)
    }

    class SynthesizerBase {
        <<abstract>>
        +config: dict
        +data_syn: pd.DataFrame
        +fit(data: pd.DataFrame)
        +sample(**kwargs)
        +fit_sample(data: pd.DataFrame, **kwargs)
    }

    class SDVSynthesizer {
        +config: dict
        +synthesizer: object
        +metadata: SchemaMetadata
        +__init__(config: dict, metadata: SchemaMetadata)
        +fit(data: pd.DataFrame)
        +sample(sample_num_rows: int, reset_sampling: bool, output_file_path: str)
        +_create_synthesizer()
        +_convert_metadata()
    }

    class CustomSynthesizer {
        +config: dict
        +metadata: SchemaMetadata
        +custom_synthesizer: object
        +__init__(config: dict, metadata: SchemaMetadata)
        +fit(data: pd.DataFrame)
        +sample(sample_num_rows: int, reset_sampling: bool, output_file_path: str)
        +_load_custom_module()
    }

    class CustomData {
        +config: dict
        +metadata: SchemaMetadata
        +__init__(config: dict, metadata: SchemaMetadata)
        +fit(data: pd.DataFrame)
        +sample()
        +_load_custom_data()
    }

    class SchemaMetadata {
        <<dataclass>>
        +id: str
        +name: str
        +attributes: list
    }

    class Config {
        <<dataclass>>
        +method: str
        +method_code: int
        +kwargs: dict
    }

    Synthesizer ..> SynthesizerBase : 使用
    Synthesizer ..> SchemaMetadata : 依賴
    Synthesizer ..> Config : 組合

    SynthesizerBase <|-- SDVSynthesizer : 繼承
    SynthesizerBase <|-- CustomSynthesizer : 繼承
    SynthesizerBase <|-- CustomData : 繼承

    SDVSynthesizer *-- SchemaMetadata : 組合
    SDVSynthesizer *-- Config : 組合

    CustomSynthesizer *-- SchemaMetadata : 組合
    CustomSynthesizer *-- Config : 組合

    CustomData *-- SchemaMetadata : 組合
    CustomData *-- Config : 組合

Legend:
Blue boxes: Main classes
Orange boxes: Subclass implementations
Light purple boxes: Configuration and data classes
<|--: Inheritance relationship
*--: Composition relationship
..>: Dependency relationship

Basic Usage

from petsard import Synthesizer

# Use default method (PETsARD Gaussian Copula)
synthesizer = Synthesizer(method='default')
synthesizer.create(metadata=metadata)
synthesizer.fit_sample(data=df)
synthetic_data = synthesizer.data_syn

# Use specific SDV method (requires SDV installation)
synthesizer = Synthesizer(method='sdv-single_table-ctgan')
synthesizer.create(metadata=metadata)
synthesizer.fit_sample(data=df, sample_num_rows=1000)

Constructor (init)

Initialize a synthetic data generator instance.

Syntax

def __init__(
    method: str,
    **kwargs
)

Parameters

method : str, required
- Synthesis method name
- Required parameter
- Supported methods:
  - 'default' or 'petsard-gaussian_copula': Use PETsARD built-in Gaussian Copula
  - 'sdv-single_table-{method}': Use SDV provided single table methods (requires separate installation: pip install 'sdv>=1.26.0,<2', for reference only)
  - 'custom_method': Custom synthesis method (requires additional parameters)
kwargs : dict, optional
- Additional parameters passed to specific synthesizers
- Custom methods require:
  - module_path: Custom module path
  - class_name: Custom class name

Returns

Synthesizer
- Initialized synthesizer instance

Usage Examples

from petsard import Synthesizer

# Use default method
synthesizer = Synthesizer(method='default')

# Use SDV CTGAN (requires SDV installation)
synthesizer = Synthesizer(method='sdv-single_table-ctgan')

# Use SDV GaussianCopula with parameters (requires SDV installation)
synthesizer = Synthesizer(
    method='sdv-single_table-gaussiancopula',
    default_distribution='truncnorm'
)

# Use custom synthesizer
synthesizer = Synthesizer(
    method='custom_method',
    module_path='custom_synthesis.py',
    class_name='MySynthesizer'
)

Default Parameters

SDV synthesizers (if used) are initialized with the following default parameters to ensure numerical precision:

enforce_rounding=True: Applied to all SDV synthesizer types to maintain integer precision for numerical columns
enforce_min_max_values=True: Applied only to TVAE and GaussianCopula synthesizers to enforce value bounds

Precision Rounding

All synthesizers automatically apply precision rounding based on schema metadata. When precision is specified in the schema (either v1.0 or v2.0 format), the synthesizer will round generated values to the specified decimal places.

This feature ensures synthetic data maintains the same numerical precision as the original data, which is critical for:

Financial data (prices, amounts)
Scientific measurements
Statistical reporting
Any precision-sensitive applications

Notes

custom_data method: The 'custom_data' method is for loading external synthetic data, handled at the framework level without synthesizer instantiation
Best practice: Use YAML configuration files instead of direct Python API
Method call order: Must call create() before fit() or fit_sample()
Data output: Generated synthetic data is stored in the data_syn attribute
Documentation: This documentation is for internal development team reference only, backward compatibility is not guaranteed
Schema usage: Recommend using SchemaMetadata to define data structure, see Metadater API documentation for detailed configuration