Synthesizer API
Synthetic data generation module supporting multiple synthesis methods and providing data generation capabilities.
Class Architecture
classDiagram
class Synthesizer {
+config: dict
+data_syn: pd.DataFrame
+synthesizer: object
+_impl: object
+__init__(method: str, **kwargs)
+create(metadata: SchemaMetadata)
+fit(data: pd.DataFrame)
+sample(sample_num_rows: int, reset_sampling: bool, output_file_path: str)
+fit_sample(data: pd.DataFrame, **kwargs)
}
class SynthesizerBase {
<<abstract>>
+config: dict
+data_syn: pd.DataFrame
+fit(data: pd.DataFrame)
+sample(**kwargs)
+fit_sample(data: pd.DataFrame, **kwargs)
}
class SDVSynthesizer {
+config: dict
+synthesizer: object
+metadata: SchemaMetadata
+__init__(config: dict, metadata: SchemaMetadata)
+fit(data: pd.DataFrame)
+sample(sample_num_rows: int, reset_sampling: bool, output_file_path: str)
+_create_synthesizer()
+_convert_metadata()
}
class CustomSynthesizer {
+config: dict
+metadata: SchemaMetadata
+custom_synthesizer: object
+__init__(config: dict, metadata: SchemaMetadata)
+fit(data: pd.DataFrame)
+sample(sample_num_rows: int, reset_sampling: bool, output_file_path: str)
+_load_custom_module()
}
class CustomData {
+config: dict
+metadata: SchemaMetadata
+__init__(config: dict, metadata: SchemaMetadata)
+fit(data: pd.DataFrame)
+sample()
+_load_custom_data()
}
class SchemaMetadata {
<<dataclass>>
+id: str
+name: str
+attributes: list
}
class Config {
<<dataclass>>
+method: str
+method_code: int
+kwargs: dict
}
Synthesizer ..> SynthesizerBase : 使用
Synthesizer ..> SchemaMetadata : 依賴
Synthesizer ..> Config : 組合
SynthesizerBase <|-- SDVSynthesizer : 繼承
SynthesizerBase <|-- CustomSynthesizer : 繼承
SynthesizerBase <|-- CustomData : 繼承
SDVSynthesizer *-- SchemaMetadata : 組合
SDVSynthesizer *-- Config : 組合
CustomSynthesizer *-- SchemaMetadata : 組合
CustomSynthesizer *-- Config : 組合
CustomData *-- SchemaMetadata : 組合
CustomData *-- Config : 組合Legend:
- Blue boxes: Main classes
- Orange boxes: Subclass implementations
- Light purple boxes: Configuration and data classes
<|--: Inheritance relationship*--: Composition relationship..>: Dependency relationship
Basic Usage
from petsard import Synthesizer
# Use default method (PETsARD Gaussian Copula)
synthesizer = Synthesizer(method='default')
synthesizer.create(metadata=metadata)
synthesizer.fit_sample(data=df)
synthetic_data = synthesizer.data_syn
# Use specific SDV method (requires SDV installation)
synthesizer = Synthesizer(method='sdv-single_table-ctgan')
synthesizer.create(metadata=metadata)
synthesizer.fit_sample(data=df, sample_num_rows=1000)Constructor (init)
Initialize a synthetic data generator instance.
Syntax
def __init__(
method: str,
**kwargs
)Parameters
method : str, required
- Synthesis method name
- Required parameter
- Supported methods:
'default'or'petsard-gaussian_copula': Use PETsARD built-in Gaussian Copula'sdv-single_table-{method}': Use SDV provided single table methods (requires separate installation:pip install 'sdv>=1.26.0,<2', for reference only)'custom_method': Custom synthesis method (requires additional parameters)
kwargs : dict, optional
- Additional parameters passed to specific synthesizers
- Custom methods require:
module_path: Custom module pathclass_name: Custom class name
Returns
- Synthesizer
- Initialized synthesizer instance
Usage Examples
from petsard import Synthesizer
# Use default method
synthesizer = Synthesizer(method='default')
# Use SDV CTGAN (requires SDV installation)
synthesizer = Synthesizer(method='sdv-single_table-ctgan')
# Use SDV GaussianCopula with parameters (requires SDV installation)
synthesizer = Synthesizer(
method='sdv-single_table-gaussiancopula',
default_distribution='truncnorm'
)
# Use custom synthesizer
synthesizer = Synthesizer(
method='custom_method',
module_path='custom_synthesis.py',
class_name='MySynthesizer'
)Default Parameters
SDV synthesizers (if used) are initialized with the following default parameters to ensure numerical precision:
enforce_rounding=True: Applied to all SDV synthesizer types to maintain integer precision for numerical columnsenforce_min_max_values=True: Applied only to TVAE and GaussianCopula synthesizers to enforce value bounds
Precision Rounding
All synthesizers automatically apply precision rounding based on schema metadata. When precision is specified in the schema (either v1.0 or v2.0 format), the synthesizer will round generated values to the specified decimal places.
This feature ensures synthetic data maintains the same numerical precision as the original data, which is critical for:
- Financial data (prices, amounts)
- Scientific measurements
- Statistical reporting
- Any precision-sensitive applications
Notes
- custom_data method: The
'custom_data'method is for loading external synthetic data, handled at the framework level without synthesizer instantiation - Best practice: Use YAML configuration files instead of direct Python API
- Method call order: Must call
create()beforefit()orfit_sample() - Data output: Generated synthetic data is stored in the
data_synattribute - Documentation: This documentation is for internal development team reference only, backward compatibility is not guaranteed
- Schema usage: Recommend using SchemaMetadata to define data structure, see Metadater API documentation for detailed configuration