SDV Methods
PETsARD integrates the SDV (Synthetic Data Vault) package, providing various advanced synthetic data generation algorithms.
Optional Feature Notice
The SDV methods described on this page are optional features, provided for reference only.
Usage Requirements:
- Requires separate installation:
pip install 'sdv>=1.26.0,<2' - Please verify that SDV’s license terms suit your use case
- Not recommended: We suggest prioritizing the built-in
petsard-gaussian_copula
Alternatives:
- PETsARD Gaussian Copula - Built-in implementation, no external dependencies
- Custom Methods - Integrate other packages
Usage Examples
Loader:
load_benchmark_with_schema:
filepath: benchmark://adult-income
schema: benchmark://adult-income_schema
Synthesizer:
gaussian:
method: sdv-single_table-gaussiancopula
ctgan:
method: sdv-single_table-ctgan
copulagan:
method: sdv-single_table-copulagan
tvae:
method: sdv-single_table-tvaeMethods Overview
| Method | method Setting | Features | GPU |
|---|---|---|---|
| GaussianCopula (SDV) | sdv-single_table-gaussiancopula | Fast, suitable for large data | ✗ |
| CTGAN | sdv-single_table-ctgan | High quality, complex patterns | ✓ |
| CopulaGAN | sdv-single_table-copulagan | Balances statistics & deep learning | ✓ |
| TVAE | sdv-single_table-tvae | Stable training, fast convergence | ✓ |
Method Details
GaussianCopula
Classical statistical distribution-based method, fast execution suitable for quick prototyping.
Features:
- ✓ Fast, suitable for large data
- ✓ Low computational requirements
- ✗ Primarily captures linear correlations
CTGAN
GAN-based deep learning method with best generation quality.
Features:
- ✓ High-quality synthetic data
- ✓ Suitable for complex patterns
- ✗ Longer training time
Default Parameters:
epochs: 300batch_size: 500generator_lr: 0.0002discriminator_lr: 0.0002
CopulaGAN
Combines Copula statistics with GAN, suitable for mixed-type data.
Features:
- ✓ Balances statistics & deep learning
- ✓ Better marginal distribution simulation
- ✓ Suitable for continuous & discrete mixed data
Default Parameters:
epochs: 300batch_size: 500default_distribution: beta
TVAE
VAE-based generative model with stable training process.
Features:
- ✓ Stable training process
- ✓ Better convergence
- ✓ Suitable for medium-scale data
Default Parameters:
epochs: 300batch_size: 500encoder_layers: [128, 128]decoder_layers: [128, 128]
Automatic Features
Schema Conversion
PETsARD automatically converts internal Schema to SDV Metadata
Automatic Parameters
All methods automatically enable:
enforce_rounding: Integer roundingenforce_min_max_values: Value range enforcement (GaussianCopula, TVAE)
GPU Detection
Deep learning methods (CTGAN, CopulaGAN, TVAE) automatically detect and use GPU.
Selection Guide
| Scenario | Recommended Method |
|---|---|
| Quick testing | GaussianCopula |
| High quality needs | CTGAN |
| Mixed-type data | CopulaGAN |
| Medium data | TVAE |
| Large data | GaussianCopula |
Available Distributions
GaussianCopula and CopulaGAN support:
norm: Normal distributiontruncnorm: Truncated normal distribution (default)beta: Beta distributiongamma: Gamma distributionuniform: Uniform distributiongaussian_kde: Kernel density estimation
Limitations
Built-in Integration Limits
- ✗ Cannot adjust training parameters (epochs, batch_size, etc.)
- ✗ Cannot specify distribution types
- ✗ Cannot manually select CPU/GPU
Important Notes
- Deep learning methods train faster on GPU
- Default 300 epochs, CPU training may be time-consuming
- Large datasets with deep learning require significant memory
- Built-in integration uses fixed parameters, cannot be adjusted