SynthesizerAdapter

SynthesizerAdapter handles synthetic data generation using various generative models with pipeline integration.

Class Architecture

classDiagram

    class SynthesizerAdapter {
        +config: dict
        +synthesizer: Synthesizer
        +is_custom_data: bool
        +loader_adapter: LoaderAdapter
        +__init__(config)
        +run() DataFrame
        -_extract_loader_config(config) dict
    }

    class Synthesizer {
        +config: dict
        +method: str
        +model: SDVModel
        +create(metadata)
        +fit_sample(data) DataFrame
        +sample() DataFrame
    }

    class LoaderAdapter {
        +load() tuple~DataFrame, Schema~
    }

    SynthesizerAdapter ..> Synthesizer : uses for data synthesis
    SynthesizerAdapter ..> LoaderAdapter : uses for custom_data method

    %% Style definitions
    class SynthesizerAdapter {
        <<Main Class>>
    }
    style SynthesizerAdapter fill:#E6E6FA

    class Synthesizer {
        <<Core Module>>
    }
    style Synthesizer fill:#4169E1,color:#fff

    class LoaderAdapter {
        <<Optional: Custom Data>>
    }
    style LoaderAdapter fill:#FFE4E1

    note for SynthesizerAdapter "1. Normal mode: Uses Synthesizer for data generation\n2. Custom data mode: Uses LoaderAdapter to load pre-generated data\n3. Supports various synthesis methods (CTGAN, GaussianCopula, etc.)"

Legend:
Light purple box: SynthesizerAdapter main class
Blue box: Core synthesis module
Light pink box: LoaderAdapter used for custom data mode
..>: Dependency relationship

Main Features

Unified interface for synthetic data generation
Support for multiple SDV synthesis methods (built-in methods not listed due to potential SDV version changes)
Automatic model training and sampling
Metadata and privacy preservation support

Method Reference

`init(config: dict)`

Initializes SynthesizerAdapter instance with synthesis configuration.

Parameters:

config: dict, required
- Configuration parameter dictionary
- Keys: method, sample_size, epochs, batch_size, use_metadata, random_state

`run(input: dict)`

Executes synthetic data generation operation.

Parameters:

input: dict, required
- Must contain:
  - data: pd.DataFrame - Training data
  - metadata: Schema - Data metadata
  - sample_size: int (optional) - Number of synthetic samples to generate

Returns: No direct return value. Use get_result() to get synthetic data.

`get_result()`

Gets the synthetic data generation results.

Returns:

tuple[pd.DataFrame, Schema]: Synthetic data and updated metadata

`set_input(data, metadata)`

Sets input data for the synthesizer.

Parameters:

data: pd.DataFrame - Training data
metadata: Schema - Data metadata

Usage Example

from petsard.adapter import SynthesizerAdapter

# Configure synthesizer
adapter = SynthesizerAdapter({
    "method": "ctgan",
    "sample_size": 1000,
    "epochs": 300,
    "batch_size": 500,
    "random_state": 42
})

# Set input
adapter.set_input(data=df, metadata=schema)

# Execute synthesis
adapter.run({
    "data": df,
    "metadata": schema
})

# Get results
synthetic_data, synthetic_metadata = adapter.get_result()

Notes

This is an internal API, not recommended for direct use
Prefer using YAML configuration files and Executor
Results are cached until next run() call

SplitterAdapter ConstrainerAdapter

SynthesizerAdapter

Class Architecture

Main Features

Method Reference

__init__(config: dict)

run(input: dict)

get_result()

set_input(data, metadata)

Usage Example

Notes

`init(config: dict)`

`run(input: dict)`

`get_result()`

`set_input(data, metadata)`