SynthesizerAdapter

SynthesizerAdapter handles synthetic data generation using various generative models with pipeline integration.

Class Architecture

classDiagram

    class SynthesizerAdapter {
        +config: dict
        +synthesizer: Synthesizer
        +is_custom_data: bool
        +loader_adapter: LoaderAdapter
        +__init__(config)
        +run() DataFrame
        -_extract_loader_config(config) dict
    }

    class Synthesizer {
        +config: dict
        +method: str
        +model: SDVModel
        +create(metadata)
        +fit_sample(data) DataFrame
        +sample() DataFrame
    }

    class LoaderAdapter {
        +load() tuple~DataFrame, Schema~
    }

    SynthesizerAdapter ..> Synthesizer : uses for data synthesis
    SynthesizerAdapter ..> LoaderAdapter : uses for custom_data method

    %% Style definitions
    class SynthesizerAdapter {
        <<Main Class>>
    }
    style SynthesizerAdapter fill:#E6E6FA

    class Synthesizer {
        <<Core Module>>
    }
    style Synthesizer fill:#4169E1,color:#fff

    class LoaderAdapter {
        <<Optional: Custom Data>>
    }
    style LoaderAdapter fill:#FFE4E1

    note for SynthesizerAdapter "1. Normal mode: Uses Synthesizer for data generation\n2. Custom data mode: Uses LoaderAdapter to load pre-generated data\n3. Supports various synthesis methods (CTGAN, GaussianCopula, etc.)"

Legend:

  • Light purple box: SynthesizerAdapter main class
  • Blue box: Core synthesis module
  • Light pink box: LoaderAdapter used for custom data mode
  • ..>: Dependency relationship

Main Features

  • Unified interface for synthetic data generation
  • Support for multiple SDV synthesis methods (built-in methods not listed due to potential SDV version changes)
  • Automatic model training and sampling
  • Metadata and privacy preservation support

Method Reference

__init__(config: dict)

Initializes SynthesizerAdapter instance with synthesis configuration.

Parameters:

  • config: dict, required
    • Configuration parameter dictionary
    • Keys: method, sample_size, epochs, batch_size, use_metadata, random_state

run(input: dict)

Executes synthetic data generation operation.

Parameters:

  • input: dict, required
    • Must contain:
      • data: pd.DataFrame - Training data
      • metadata: Schema - Data metadata
      • sample_size: int (optional) - Number of synthetic samples to generate

Returns: No direct return value. Use get_result() to get synthetic data.

get_result()

Gets the synthetic data generation results.

Returns:

  • tuple[pd.DataFrame, Schema]: Synthetic data and updated metadata

set_input(data, metadata)

Sets input data for the synthesizer.

Parameters:

  • data: pd.DataFrame - Training data
  • metadata: Schema - Data metadata

Usage Example

from petsard.adapter import SynthesizerAdapter

# Configure synthesizer
adapter = SynthesizerAdapter({
    "method": "ctgan",
    "sample_size": 1000,
    "epochs": 300,
    "batch_size": 500,
    "random_state": 42
})

# Set input
adapter.set_input(data=df, metadata=schema)

# Execute synthesis
adapter.run({
    "data": df,
    "metadata": schema
})

# Get results
synthetic_data, synthetic_metadata = adapter.get_result()

Notes

  • This is an internal API, not recommended for direct use
  • Prefer using YAML configuration files and Executor
  • Results are cached until next run() call