SynthesizerAdapter
SynthesizerAdapter handles synthetic data generation using various generative models with pipeline integration.
Class Architecture
classDiagram class SynthesizerAdapter { +config: dict +synthesizer: Synthesizer +is_custom_data: bool +loader_adapter: LoaderAdapter +__init__(config) +run() DataFrame -_extract_loader_config(config) dict } class Synthesizer { +config: dict +method: str +model: SDVModel +create(metadata) +fit_sample(data) DataFrame +sample() DataFrame } class LoaderAdapter { +load() tuple~DataFrame, Schema~ } SynthesizerAdapter ..> Synthesizer : uses for data synthesis SynthesizerAdapter ..> LoaderAdapter : uses for custom_data method %% Style definitions class SynthesizerAdapter { <<Main Class>> } style SynthesizerAdapter fill:#E6E6FA class Synthesizer { <<Core Module>> } style Synthesizer fill:#4169E1,color:#fff class LoaderAdapter { <<Optional: Custom Data>> } style LoaderAdapter fill:#FFE4E1 note for SynthesizerAdapter "1. Normal mode: Uses Synthesizer for data generation\n2. Custom data mode: Uses LoaderAdapter to load pre-generated data\n3. Supports various synthesis methods (CTGAN, GaussianCopula, etc.)"
Legend:
- Light purple box: SynthesizerAdapter main class
- Blue box: Core synthesis module
- Light pink box: LoaderAdapter used for custom data mode
..>
: Dependency relationship
Main Features
- Unified interface for synthetic data generation
- Support for multiple SDV synthesis methods (built-in methods not listed due to potential SDV version changes)
- Automatic model training and sampling
- Metadata and privacy preservation support
Method Reference
__init__(config: dict)
Initializes SynthesizerAdapter instance with synthesis configuration.
Parameters:
config
: dict, required- Configuration parameter dictionary
- Keys:
method
,sample_size
,epochs
,batch_size
,use_metadata
,random_state
run(input: dict)
Executes synthetic data generation operation.
Parameters:
input
: dict, required- Must contain:
data
: pd.DataFrame - Training datametadata
: Schema - Data metadatasample_size
: int (optional) - Number of synthetic samples to generate
- Must contain:
Returns:
No direct return value. Use get_result()
to get synthetic data.
get_result()
Gets the synthetic data generation results.
Returns:
tuple[pd.DataFrame, Schema]
: Synthetic data and updated metadata
set_input(data, metadata)
Sets input data for the synthesizer.
Parameters:
data
: pd.DataFrame - Training datametadata
: Schema - Data metadata
Usage Example
from petsard.adapter import SynthesizerAdapter
# Configure synthesizer
adapter = SynthesizerAdapter({
"method": "ctgan",
"sample_size": 1000,
"epochs": 300,
"batch_size": 500,
"random_state": 42
})
# Set input
adapter.set_input(data=df, metadata=schema)
# Execute synthesis
adapter.run({
"data": df,
"metadata": schema
})
# Get results
synthetic_data, synthetic_metadata = adapter.get_result()
Notes
- This is an internal API, not recommended for direct use
- Prefer using YAML configuration files and Executor
- Results are cached until next run() call